jburtoft commited on
Commit
0123019
·
verified ·
1 Parent(s): 4c940c8

Update README.md

Browse files

Initial commit - updates coming

Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -1,3 +1,88 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - pytorch
9
+ - inferentia2
10
+ - neuron
11
  ---
12
+ # Neuronx model for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0)
13
+
14
+ This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf).
15
+ You can find detailed information about the base model on its [Model Card](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
16
+
17
+ This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't quite what you are looking for.
18
+
19
+ This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
20
+
21
+ It has been compiled to run on an inf2.24xlarge instance on AWS.
22
+
23
+ **This has been compiled using version 2.16 of the Neuron SDK. Make sure your environment has version 2.16 installed**
24
+
25
+ Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
26
+
27
+ ## Set up the environment
28
+
29
+ First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled. However, you may need to update to version 2.16 to use these binaries.
30
+
31
+ ```
32
+ sudo apt-get update -y \
33
+ && sudo apt-get install -y --no-install-recommends \
34
+ aws-neuronx-dkms=2.15.9.0 \
35
+ aws-neuronx-collectives=2.19.7.0-530fb3064 \
36
+ aws-neuronx-runtime-lib=2.19.5.0-97e2d271b \
37
+ aws-neuronx-tools=2.16.1.0
38
+
39
+ pip3 install --upgrade \
40
+ neuronx-cc==2.12.54.0 \
41
+ torch-neuronx==1.13.1.1.13.0 \
42
+ transformers-neuronx==0.9.474 \
43
+ --extra-index-url=https://pip.repos.neuron.amazonaws.com
44
+ ```
45
+ ## Running inference from this repository
46
+
47
+
48
+ ```
49
+ from optimum.neuron import pipeline
50
+ p = pipeline('text-generation', 'jburtoft/SOLAR-10.7B-v1.0-neuron-24xlarge-4096')
51
+ p("import socket\n\ndef ping_exponential_backoff(host: str):",
52
+ do_sample=True,
53
+ top_k=10,
54
+ temperature=0.1,
55
+ top_p=0.95,
56
+ num_return_sequences=1,
57
+ max_length=200,
58
+ )
59
+ ```
60
+ ```
61
+ [{generated text here}]
62
+ ```
63
+
64
+ ##Compiling for different instances or settings
65
+
66
+ If this repository doesn't have the exact version or settings, you can compile your own.
67
+
68
+ (to be added)
69
+
70
+ This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
71
+
72
+ ## Arguments passed during export
73
+
74
+ **input_shapes**
75
+ ```json
76
+ {
77
+ "batch_size": 1,
78
+ "sequence_length": 4096,
79
+ }
80
+ ```
81
+ **compiler_args**
82
+
83
+ ```json
84
+ {
85
+ "auto_cast_type": "fp16",
86
+ "num_cores": 12,
87
+ }
88
+ ```