nanochat-german-v1 / README.md
stefan-it's picture
docs: mention demo space
221c489 verified
---
license: apache-2.0
language:
- de
datasets:
- stefan-it/nanochat-german-alpaca
- argilla/databricks-dolly-15k-curated-multilingual
- FreedomIntelligence/evol-instruct-deutsch
- LSX-UniWue/Guanako
- stefan-it/nanochat-german-openhermes
- FreedomIntelligence/sharegpt-deutsch
tags:
- nanochat
- german
- v1
base_model:
- stefan-it/nanochat-german-base
---
# 🇩🇪 nanochat German: v1
<p align="left">
<picture>
<img alt="nanochat German logo" src="https://raw.githubusercontent.com/stefan-it/nanochat-german/main/assets/nanochat-german.png" style="max-width: 75%;">
</picture>
<br/>
</p>
This repository hosts the first German nanochat model. It was fine-tuned (mid-training phase) on various German SFT datasets.
💬 A demo space of the model can be found [here](https://huggingface.co/spaces/stefan-it/nanochat-german-v1).
## Datasets
The chat model was fine-tuned on the following datasets:
* [German Alpaca](https://huggingface.co/datasets/stefan-it/nanochat-german-alpaca)
* [German Dolly](https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual)
* [German Evol Instruct](https://huggingface.co/datasets/FreedomIntelligence/evol-instruct-deutsch)
* [German Guanako](https://huggingface.co/datasets/LSX-UniWue/Guanako)
* [German Openhermes](https://huggingface.co/datasets/stefan-it/nanochat-german-openhermes)
* [German ShareGPT](https://huggingface.co/datasets/FreedomIntelligence/sharegpt-deutsch)
* German Spelling Tasks
More information can be found in the corresponding [German nanochat repository](https://github.com/stefan-it/nanochat-german).
## Fine-Tuning Stats
- run: nanochat-german
- device_type:
- dtype: bfloat16
- num_iterations: -1
- max_seq_len: 2048
- device_batch_size: 32
- unembedding_lr: 0.0040
- embedding_lr: 0.2000
- matrix_lr: 0.0200
- init_lr_frac: 1.0000
- weight_decay: 0.0000
- eval_every: 150
- eval_tokens: 10,485,760
- total_batch_size: 524,288
- dry_run: 0
- Number of iterations: 346
- DDP world size: 8
- Minimum validation bpb: 0.6001
## Evaluation Results
We use `lm_eval` to measure and compare the model's performance against other language models in the same parameter range (note: this list is not exhaustive):
<table class="model-comparison">
<thead>
<tr>
<th align="left">Model</th>
<th align="center" colspan="2">arc_de</th>
<th align="center" colspan="2">hellaswag_de</th>
<th align="center">m_mmlu_de</th>
<th align="center">truthfulqa_de_mc1</th>
<th align="center">truthfulqa_de_mc2</th>
</tr>
<tr>
<th></th>
<th align="center">acc</th>
<th align="center">acc_norm</th>
<th align="center">acc</th>
<th align="center">acc_norm</th>
<th align="center">acc</th>
<th align="center">acc</th>
<th align="center">acc</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://huggingface.co/stefan-it/nanochat-german-v1" target="_blank">nanochat German v1</a></td>
<td align="center">0.2241</td>
<td align="center">0.2626</td>
<td align="center">0.3203</td>
<td align="center">0.3581</td>
<td align="center">0.2285</td>
<td align="center">0.2500</td>
<td align="center">0.4184</td>
</tr>
<tr>
<td><a href="https://huggingface.co/LSX-UniWue/LLaMmlein_120M" target="_blank">LLäMmlein-120M</a></td>
<td align="center">0.1942</td>
<td align="center">0.2301</td>
<td align="center">0.2945</td>
<td align="center">0.3178</td>
<td align="center">0.2285</td>
<td align="center">0.2310</td>
<td align="center">0.4055</td>
</tr>
<tr>
<td><a href="https://huggingface.co/LSX-UniWue/LLaMmlein_1B" target="_blank">LLäMmlein-1B</a></td>
<td align="center">0.2515</td>
<td align="center">0.2960</td>
<td align="center">0.3703</td>
<td align="center">0.4490</td>
<td align="center">0.2317</td>
<td align="center">0.2322</td>
<td align="center">0.3617</td>
</tr>
</tbody>
</table>
Command that was used to retrieve evaluation results - using our model:
```python
lm_eval --model hf \
--model_args pretrained="stefan-it/nanochat-german-v1" \
--tasks "arc_de,hellaswag_de,m_mmlu_de,truthfulqa_de_mc1,truthfulqa_de_mc2" \
--device cuda:0 \
--batch_size auto \
--trust_remote_code \
--log_samples \
--output_path ./nanochat-german-v1
```
## Demo
To generate some text, please make sure that you are using [this specific](https://github.com/huggingface/transformers/pull/41634) HF branch.
Then the following code can be used:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "stefan-it/nanochat-german-v1"
revision = "main"
max_new_tokens = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
model.eval()
conversation = [
{"role": "user", "content": "Was ist die Hauptstadt von Bayern?"},
]
inputs = tokenizer.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
)
# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
```
## License
The model is licences under a permissive Apache 2.0 license.
## Acknowledgements
- Many thanks to Andrej Karpathy's original [nanochat](https://github.com/karpathy/nanochat) repo!
- Thanks to the [LLäMmlein team](https://huggingface.co/LSX-UniWue) for making the pretraining data publicly available.
- Thanks to [Ben](https://huggingface.co/burtenshaw) and [Joshua](https://huggingface.co/Xenova) for help and working on the nanochat [HF integration](https://github.com/huggingface/transformers/pull/41634).