stefan-it
/

nanochat-german-v1

Model card Files Files and versions

stefan-it commited on Oct 26

Commit

6d96d5d

·

verified ·

1 Parent(s): 46338bb

docs: add initial version

Files changed (1) hide show

README.md +88 -3

README.md CHANGED Viewed

@@ -1,3 +1,88 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- de
+datasets:
+- stefan-it/nanochat-german-data
+tags:
+- nanochat
+- german
+---
+# 🇩🇪 nanochat German: Chat Model
+This repository hosts the first German nanochat model. It was fine-tuned (mid-training phase) on various German SFT datasets.
+## Datasets
+The chat model was fine-tuned on the following datasets:
+* [German Alpaca](https://huggingface.co/datasets/stefan-it/nanochat-german-alpaca)
+* [German Dolly](https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual)
+* [German Evol Instruct](https://huggingface.co/datasets/FreedomIntelligence/evol-instruct-deutsch)
+* [German Guanako](https://huggingface.co/datasets/LSX-UniWue/Guanako)
+* [German Openhermes](https://huggingface.co/datasets/stefan-it/nanochat-german-openhermes)
+* [German ShareGPT](https://huggingface.co/datasets/FreedomIntelligence/sharegpt-deutsch)
+* German Spelling Tasks
+More information can be found in the corresponding [dataset repository](https://huggingface.co/datasets/stefan-it/nanochat-german-data).
+## Fine-Tuning Stats
+- run: nanochat-german
+- device_type:
+- dtype: bfloat16
+- num_iterations: -1
+- max_seq_len: 2048
+- device_batch_size: 32
+- unembedding_lr: 0.0040
+- embedding_lr: 0.2000
+- matrix_lr: 0.0200
+- init_lr_frac: 1.0000
+- weight_decay: 0.0000
+- eval_every: 150
+- eval_tokens: 10,485,760
+- total_batch_size: 524,288
+- dry_run: 0
+- Number of iterations: 346
+- DDP world size: 8
+- Minimum validation bpb: 0.6001
+## Demo
+To generate some text, please make sure that you are using [this specific](https://github.com/huggingface/transformers/pull/41634) HF branch.
+Then the following code can be used:
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+model_id = "stefan-it/nanochat-german-base"
+revision = "main"
+max_new_tokens = 64
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
+model.eval()
+prompt = "Die Altstadt von München "
+generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=device, max_new_tokens=max_new_tokens)
+outputs = generator(prompt)
+print(outputs)
+```
+## License
+The model is licences under a permissive Apache 2.0 license.
+## Acknowledgements
+- Many thanks to Andrej Karpathy's original [nanochat](https://github.com/karpathy/nanochat) repo!
+- Thanks to the [LLäMmlein team](https://huggingface.co/LSX-UniWue) for making the pretraining data publicly available.
+- Thanks to [Ben](https://huggingface.co/burtenshaw) and [Joshua](https://huggingface.co/Xenova) for help and working on the nanochat [HF integration](https://github.com/huggingface/transformers/pull/41634).