togethercomputer
/

Pythia-Chat-Base-7B

@@ -18,7 +18,10 @@ Together partnered with LAION and Ontocord.ai, who both helped curate the datase
 You can read more about this process and the availability of this dataset in LAION’s blog post [here](https://laion.ai/blog/oig-dataset/).
 In addition to the aforementioned fine-tuning, Pythia-Chat-Base-7B-v0.16 has also undergone further fine-tuning via a small amount of feedback data.
-This allows the model to better adapt to human preferences in the conversations.
 ## Model Details
 - **Developed by**: Together Computer.
@@ -30,18 +33,59 @@ This allows the model to better adapt to human preferences in the conversations.
 # Quick Start
 ```python
-from transformers import pipeline
-pipe = pipeline(model='togethercomputer/Pythia-Chat-Base-7B-v0.16')
-pipe('''<human>: Hello!\n<bot>:''')
 ```
-or
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
-model = AutoModelForCausalLM.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
 ```
 ## Strengths of the model
 There are several tasks that OpenChatKit excels at out of the box. This includes:

 You can read more about this process and the availability of this dataset in LAION’s blog post [here](https://laion.ai/blog/oig-dataset/).
 In addition to the aforementioned fine-tuning, Pythia-Chat-Base-7B-v0.16 has also undergone further fine-tuning via a small amount of feedback data.
+This process allows the model to better adapt to human preferences in the conversations.
+One of the notable features of Pythia-Chat-Base-7B-v0.16 is its ability to **run inference on a 12GB GPU**, thanks to the quantization technique.
+This makes the model not only highly accurate and efficient but also accessible to a wider range of users and hardware configurations.
 ## Model Details
 - **Developed by**: Together Computer.
 # Quick Start
+## GPU Inference
+This requires a GPU with 16GB memory.
 ```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# init
+tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
+model = AutoModelForCausalLM.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16", torch_dtype=torch.float16)
+model = model.to('cuda:0')
+# infer
+inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
+output_str = tokenizer.decode(outputs[0])
+print(output_str)
 ```
+## GPU Inference in Int8
+This requires a GPU with 12GB memory.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
+# init
 tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
+model = AutoModelForCausalLM.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16", device_map="auto", load_in_8bit=True)
+# infer
+inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
+output_str = tokenizer.decode(outputs[0])
+print(output_str)
 ```
+## CPU Inference
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# init
+tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
+model = AutoModelForCausalLM.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16", torch_dtype=torch.bfloat16)
+# infer
+inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
+output_str = tokenizer.decode(outputs[0])
+print(output_str)
+```
 ## Strengths of the model
 There are several tasks that OpenChatKit excels at out of the box. This includes: