docs: add initial version
Browse files
README.md
CHANGED
|
@@ -1,3 +1,88 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- de
|
| 5 |
+
datasets:
|
| 6 |
+
- stefan-it/nanochat-german-data
|
| 7 |
+
tags:
|
| 8 |
+
- nanochat
|
| 9 |
+
- german
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# 🇩🇪 nanochat German: Chat Model
|
| 13 |
+
|
| 14 |
+
This repository hosts the first German nanochat model. It was fine-tuned (mid-training phase) on various German SFT datasets.
|
| 15 |
+
|
| 16 |
+
## Datasets
|
| 17 |
+
|
| 18 |
+
The chat model was fine-tuned on the following datasets:
|
| 19 |
+
|
| 20 |
+
* [German Alpaca](https://huggingface.co/datasets/stefan-it/nanochat-german-alpaca)
|
| 21 |
+
* [German Dolly](https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual)
|
| 22 |
+
* [German Evol Instruct](https://huggingface.co/datasets/FreedomIntelligence/evol-instruct-deutsch)
|
| 23 |
+
* [German Guanako](https://huggingface.co/datasets/LSX-UniWue/Guanako)
|
| 24 |
+
* [German Openhermes](https://huggingface.co/datasets/stefan-it/nanochat-german-openhermes)
|
| 25 |
+
* [German ShareGPT](https://huggingface.co/datasets/FreedomIntelligence/sharegpt-deutsch)
|
| 26 |
+
* German Spelling Tasks
|
| 27 |
+
|
| 28 |
+
More information can be found in the corresponding [dataset repository](https://huggingface.co/datasets/stefan-it/nanochat-german-data).
|
| 29 |
+
|
| 30 |
+
## Fine-Tuning Stats
|
| 31 |
+
|
| 32 |
+
- run: nanochat-german
|
| 33 |
+
- device_type:
|
| 34 |
+
- dtype: bfloat16
|
| 35 |
+
- num_iterations: -1
|
| 36 |
+
- max_seq_len: 2048
|
| 37 |
+
- device_batch_size: 32
|
| 38 |
+
- unembedding_lr: 0.0040
|
| 39 |
+
- embedding_lr: 0.2000
|
| 40 |
+
- matrix_lr: 0.0200
|
| 41 |
+
- init_lr_frac: 1.0000
|
| 42 |
+
- weight_decay: 0.0000
|
| 43 |
+
- eval_every: 150
|
| 44 |
+
- eval_tokens: 10,485,760
|
| 45 |
+
- total_batch_size: 524,288
|
| 46 |
+
- dry_run: 0
|
| 47 |
+
- Number of iterations: 346
|
| 48 |
+
- DDP world size: 8
|
| 49 |
+
- Minimum validation bpb: 0.6001
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
## Demo
|
| 53 |
+
|
| 54 |
+
To generate some text, please make sure that you are using [this specific](https://github.com/huggingface/transformers/pull/41634) HF branch.
|
| 55 |
+
|
| 56 |
+
Then the following code can be used:
|
| 57 |
+
|
| 58 |
+
```python
|
| 59 |
+
import torch
|
| 60 |
+
|
| 61 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
model_id = "stefan-it/nanochat-german-base"
|
| 65 |
+
revision = "main"
|
| 66 |
+
max_new_tokens = 64
|
| 67 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 68 |
+
|
| 69 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
|
| 70 |
+
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
|
| 71 |
+
model.eval()
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
prompt = "Die Altstadt von München "
|
| 75 |
+
generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=device, max_new_tokens=max_new_tokens)
|
| 76 |
+
outputs = generator(prompt)
|
| 77 |
+
print(outputs)
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
## License
|
| 81 |
+
|
| 82 |
+
The model is licences under a permissive Apache 2.0 license.
|
| 83 |
+
|
| 84 |
+
## Acknowledgements
|
| 85 |
+
|
| 86 |
+
- Many thanks to Andrej Karpathy's original [nanochat](https://github.com/karpathy/nanochat) repo!
|
| 87 |
+
- Thanks to the [LLäMmlein team](https://huggingface.co/LSX-UniWue) for making the pretraining data publicly available.
|
| 88 |
+
- Thanks to [Ben](https://huggingface.co/burtenshaw) and [Joshua](https://huggingface.co/Xenova) for help and working on the nanochat [HF integration](https://github.com/huggingface/transformers/pull/41634).
|