jburtoft
/

SOLAR-10.7B-v1.0-neuron-24xlarge-4096

Text Generation

Model card Files Files and versions

jburtoft commited on Jan 12, 2024

Commit

0123019

·

verified ·

1 Parent(s): 4c940c8

Update README.md

Initial commit - updates coming

Files changed (1) hide show

README.md +85 -0

README.md CHANGED Viewed

@@ -1,3 +1,88 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+  - en
+pipeline_tag: text-generation
+inference: false
+tags:
+  - pytorch
+  - inferentia2
+  - neuron
 ---
+# Neuronx model for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0)
+This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf).
+You can find detailed information about the base model on its [Model Card](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
+This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't quite what you are looking for.
+This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
+It has been compiled to run on an inf2.24xlarge instance on AWS.
+**This has been compiled using version 2.16 of the Neuron SDK.  Make sure your environment has version 2.16 installed**
+Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
+## Set up the environment
+First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2).  It has most of the utilities and drivers preinstalled.  However, you may need to update to version 2.16 to use these binaries.
+```
+sudo apt-get update -y \
+ && sudo apt-get install -y --no-install-recommends \
+    aws-neuronx-dkms=2.15.9.0 \
+    aws-neuronx-collectives=2.19.7.0-530fb3064 \
+    aws-neuronx-runtime-lib=2.19.5.0-97e2d271b \
+    aws-neuronx-tools=2.16.1.0
+pip3 install --upgrade \
+    neuronx-cc==2.12.54.0 \
+    torch-neuronx==1.13.1.1.13.0 \
+    transformers-neuronx==0.9.474 \
+    --extra-index-url=https://pip.repos.neuron.amazonaws.com
+```
+## Running inference from this repository
+```
+from optimum.neuron import pipeline
+p = pipeline('text-generation', 'jburtoft/SOLAR-10.7B-v1.0-neuron-24xlarge-4096')
+p("import socket\n\ndef ping_exponential_backoff(host: str):",
+    do_sample=True,
+    top_k=10,
+    temperature=0.1,
+    top_p=0.95,
+    num_return_sequences=1,
+    max_length=200,
+)
+```
+```
+[{generated text here}]
+```
+##Compiling for different instances or settings
+If this repository doesn't have the exact version or settings, you can compile your own.
+(to be added)
+This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
+## Arguments passed during export
+**input_shapes**
+```json
+{
+  "batch_size": 1,
+  "sequence_length": 4096,
+}
+```
+**compiler_args**
+```json
+{
+  "auto_cast_type": "fp16",
+  "num_cores": 12,
+}
+```