Felladrin commited on
Commit
0e59ae5
·
verified ·
1 Parent(s): e0e3cc8

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - transformers
8
+ library_name: transformers.js
9
+ base_model:
10
+ - PleIAs/Monad
11
+ ---
12
+
13
+
14
+
15
+ # Monad (ONNX)
16
+
17
+
18
+ This is an ONNX version of [PleIAs/Monad](https://huggingface.co/PleIAs/Monad). It was automatically converted and uploaded using [this Hugging Face Space](https://huggingface.co/spaces/onnx-community/convert-to-onnx).
19
+
20
+
21
+ ## Usage with Transformers.js
22
+
23
+
24
+ See the pipeline documentation for `text-generation`: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TextGenerationPipeline
25
+
26
+
27
+ ---
28
+
29
+
30
+ # ⚛️ Monad
31
+
32
+ <div align="center">
33
+ <img src="figures/pleias.jpg" width="60%" alt="Pleias" />
34
+ </div>
35
+
36
+ <p align="center">
37
+ <a href="https://pleias.fr/blog/blogsynth-the-new-data-frontier"><b>Blog announcement</b></a>
38
+ </p>
39
+
40
+ **Monad** is a 56 million parameters generalist Small Reasoning Model, trained on 200 billions tokens from <a href="https://huggingface.co/PleIAs/Baguettotron">SYNTH</a>, a fully open generalist dataset.
41
+
42
+ As of 2025, Monad is the best contender for the smallest viable language models. Despite being less than half of gpt-2, Monad not only answers in consistent English but performs significanly beyond chance on MMLU and other major industry benchmarks.
43
+
44
+ <p align="center">
45
+ <img width="80%" src="figures/training_efficiency.jpeg">
46
+ </p>
47
+
48
+ Monad's name is a reference to Leibniz concept and general idea of the smallest possible unit of intelligence.
49
+
50
+ ## Features
51
+ Monad has been natively trained for instructions with thinking traces. We implemented a series of dedicated pipelines for:
52
+ * Memorization of encyclopedic knowledge (50,000 vital articles from Wikipedia), though in this size range hallucinations have to be expected.
53
+ * Retrieval-Augmented Generation with grounding (following on our initial experiments with Pleias-RAG series)
54
+ * Arithmetic and simple math resolution problem
55
+ * Editing tasks
56
+ * Information extraction
57
+ * Creative writing, including unusual synthetic exercises like lipograms or layout poems.
58
+
59
+ Monad is strictly monolingual in English. We trained a new custom tokenizer (likely one of the smallest tokenizer to date, less than 8,000 individual tokens), exclusively trained on SYNTH so that we maintain a relatively good compression ratio.
60
+
61
+ ## Model design and training
62
+ Monad is a 56M parameters decoders with a standard Qwen/Llama-like design, except for its extremely compact size and overall opiniated architecture for depth (with 64 layers)
63
+ <p align="center">
64
+ <img width="80%" src="figures/monad_structure.png">
65
+ </p>
66
+
67
+ Monad was trained on 16 h100 from Jean Zay (compute plan n°A0191016886). Full pre-training took a bit less than 6 hours.
68
+
69
+ ## Evaluation
70
+ Monad attains performance on MMLU significantly beyond chance with close to 30% of positive rate. We also find non-random results on gsm8k (8%) and HotPotQA (8%)
71
+
72
+ To our knowledge, there is no model remotely close in this size range for evaluation comparison. Spiritually and practically, Monad remains unique.
73
+
74
+ ## Use and deployment
75
+ Monad has been trained on the standard instruction style from Qwen.
76
+
77
+ ```xml
78
+ <|im_start|>user
79
+ Who are you?<|im_end|>
80
+ <|im_start|>assistant
81
+ <think>
82
+ ```
83
+
84
+ Monad has no support yet for multi-turn.
85
+
86
+ A major envisioned use case for Monad is explainability, as the model does provide a unique trade-off between observability and actual reasoning performance.
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "_name_or_path": "PleIAs/Monad",
4
+ "architectures": [
5
+ "LlamaForCausalLM"
6
+ ],
7
+ "attention_bias": false,
8
+ "attention_dropout": 0.0,
9
+ "bos_token_id": 1,
10
+ "eos_token_id": 2,
11
+ "head_dim": 64,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 256,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 768,
16
+ "max_position_embeddings": 2048,
17
+ "mlp_bias": false,
18
+ "model_type": "llama",
19
+ "num_attention_heads": 4,
20
+ "num_hidden_layers": 64,
21
+ "num_key_value_heads": 4,
22
+ "pretraining_tp": 1,
23
+ "rms_norm_eps": 1e-05,
24
+ "rope_scaling": null,
25
+ "rope_theta": 10000,
26
+ "tie_word_embeddings": true,
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.49.0",
29
+ "use_cache": true,
30
+ "vocab_size": 8192
31
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.49.0"
6
+ }
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8243735cd3ca8f725f87aa5dee6142bb077a8a1db3a1f64087dab8554bf6e8c4
3
+ size 227902592
onnx/model_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bbb4bc65757e51e06f7831a75e00c8132e29f271de994262be19200869ebb85
3
+ size 40539987
onnx/model_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b9402c399e2d1be8de2249c30d6b6c11123cb05248367f9b64f371bcb4f1d38
3
+ size 114619221
onnx/model_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9c27ae76f508e5aeb8b1954446831b9e7bde5d2da5cbf027cbb674bcdb2f0bd
3
+ size 58526168
onnx/model_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb6b89b38a81c43dd15860ea9b17760294e90ab7daac2771ef51f19f6cb8753e
3
+ size 43944659
onnx/model_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67000f2c517b89ed22ad425810fda68940962bb8ade11a26dd14d1de74e9516f
3
+ size 36304936
onnx/model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9c27ae76f508e5aeb8b1954446831b9e7bde5d2da5cbf027cbb674bcdb2f0bd
3
+ size 58526168
onnx/model_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fec7db87faa8668021649d4899249b32ea6525c734ceda71a8f1ed6635b47737
3
+ size 58526397
quantize_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "modes": [
3
+ "fp16",
4
+ "q8",
5
+ "int8",
6
+ "uint8",
7
+ "q4",
8
+ "q4f16",
9
+ "bnb4"
10
+ ],
11
+ "per_channel": false,
12
+ "reduce_range": false,
13
+ "block_size": null,
14
+ "is_symmetric": true,
15
+ "accuracy_level": null,
16
+ "quant_type": 1,
17
+ "op_block_list": null
18
+ }
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[UNK]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<|begin_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<|end_of_text|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[PAD]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ }
35
+ },
36
+ "clean_up_tokenization_spaces": false,
37
+ "extra_special_tokens": {},
38
+ "model_max_length": 1000000000000000019884624838656,
39
+ "tokenizer_class": "PreTrainedTokenizer"
40
+ }