stefan-it commited on
Commit
6d96d5d
·
verified ·
1 Parent(s): 46338bb

docs: add initial version

Browse files
Files changed (1) hide show
  1. README.md +88 -3
README.md CHANGED
@@ -1,3 +1,88 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - de
5
+ datasets:
6
+ - stefan-it/nanochat-german-data
7
+ tags:
8
+ - nanochat
9
+ - german
10
+ ---
11
+
12
+ # 🇩🇪 nanochat German: Chat Model
13
+
14
+ This repository hosts the first German nanochat model. It was fine-tuned (mid-training phase) on various German SFT datasets.
15
+
16
+ ## Datasets
17
+
18
+ The chat model was fine-tuned on the following datasets:
19
+
20
+ * [German Alpaca](https://huggingface.co/datasets/stefan-it/nanochat-german-alpaca)
21
+ * [German Dolly](https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual)
22
+ * [German Evol Instruct](https://huggingface.co/datasets/FreedomIntelligence/evol-instruct-deutsch)
23
+ * [German Guanako](https://huggingface.co/datasets/LSX-UniWue/Guanako)
24
+ * [German Openhermes](https://huggingface.co/datasets/stefan-it/nanochat-german-openhermes)
25
+ * [German ShareGPT](https://huggingface.co/datasets/FreedomIntelligence/sharegpt-deutsch)
26
+ * German Spelling Tasks
27
+
28
+ More information can be found in the corresponding [dataset repository](https://huggingface.co/datasets/stefan-it/nanochat-german-data).
29
+
30
+ ## Fine-Tuning Stats
31
+
32
+ - run: nanochat-german
33
+ - device_type:
34
+ - dtype: bfloat16
35
+ - num_iterations: -1
36
+ - max_seq_len: 2048
37
+ - device_batch_size: 32
38
+ - unembedding_lr: 0.0040
39
+ - embedding_lr: 0.2000
40
+ - matrix_lr: 0.0200
41
+ - init_lr_frac: 1.0000
42
+ - weight_decay: 0.0000
43
+ - eval_every: 150
44
+ - eval_tokens: 10,485,760
45
+ - total_batch_size: 524,288
46
+ - dry_run: 0
47
+ - Number of iterations: 346
48
+ - DDP world size: 8
49
+ - Minimum validation bpb: 0.6001
50
+
51
+
52
+ ## Demo
53
+
54
+ To generate some text, please make sure that you are using [this specific](https://github.com/huggingface/transformers/pull/41634) HF branch.
55
+
56
+ Then the following code can be used:
57
+
58
+ ```python
59
+ import torch
60
+
61
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
62
+
63
+
64
+ model_id = "stefan-it/nanochat-german-base"
65
+ revision = "main"
66
+ max_new_tokens = 64
67
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
68
+
69
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
70
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
71
+ model.eval()
72
+
73
+
74
+ prompt = "Die Altstadt von München "
75
+ generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=device, max_new_tokens=max_new_tokens)
76
+ outputs = generator(prompt)
77
+ print(outputs)
78
+ ```
79
+
80
+ ## License
81
+
82
+ The model is licences under a permissive Apache 2.0 license.
83
+
84
+ ## Acknowledgements
85
+
86
+ - Many thanks to Andrej Karpathy's original [nanochat](https://github.com/karpathy/nanochat) repo!
87
+ - Thanks to the [LLäMmlein team](https://huggingface.co/LSX-UniWue) for making the pretraining data publicly available.
88
+ - Thanks to [Ben](https://huggingface.co/burtenshaw) and [Joshua](https://huggingface.co/Xenova) for help and working on the nanochat [HF integration](https://github.com/huggingface/transformers/pull/41634).