File size: 3,382 Bytes
0e59ae5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- transformers
library_name: transformers.js
base_model:
- PleIAs/Monad
---



# Monad (ONNX)


This is an ONNX version of [PleIAs/Monad](https://huggingface.co/PleIAs/Monad). It was automatically converted and uploaded using [this Hugging Face Space](https://huggingface.co/spaces/onnx-community/convert-to-onnx).


## Usage with Transformers.js


See the pipeline documentation for `text-generation`: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TextGenerationPipeline


---


# ⚛️ Monad

<div align="center">
  <img src="figures/pleias.jpg" width="60%" alt="Pleias" />
</div>

<p align="center">
  <a href="https://pleias.fr/blog/blogsynth-the-new-data-frontier"><b>Blog announcement</b></a>
</p>

**Monad** is a 56 million parameters generalist Small Reasoning Model, trained on 200 billions tokens from <a href="https://huggingface.co/PleIAs/Baguettotron">SYNTH</a>, a fully open generalist dataset.

As of 2025, Monad is the best contender for the smallest viable language models. Despite being less than half of gpt-2, Monad not only answers in consistent English but performs significanly beyond chance on MMLU and other major industry benchmarks.

<p align="center">
  <img width="80%" src="figures/training_efficiency.jpeg">
</p>

Monad's name is a reference to Leibniz concept and general idea of the smallest possible unit of intelligence.

## Features
Monad has been natively trained for instructions with thinking traces. We implemented a series of dedicated pipelines for:
* Memorization of encyclopedic knowledge (50,000 vital articles from Wikipedia), though in this size range hallucinations have to be expected.
* Retrieval-Augmented Generation with grounding (following on our initial experiments with Pleias-RAG series)
* Arithmetic and simple math resolution problem
* Editing tasks
* Information extraction
* Creative writing, including unusual synthetic exercises like lipograms or layout poems.

Monad is strictly monolingual in English. We trained a new custom tokenizer (likely one of the smallest tokenizer to date, less than 8,000 individual tokens), exclusively trained on SYNTH so that we maintain a relatively good compression ratio. 

## Model design and training
Monad is a 56M parameters decoders with a standard Qwen/Llama-like design, except for its extremely compact size and overall opiniated architecture for depth (with 64 layers)
<p align="center">
  <img width="80%" src="figures/monad_structure.png">
</p>

Monad was trained on 16 h100 from Jean Zay (compute plan n°A0191016886). Full pre-training took a bit less than 6 hours.

## Evaluation
Monad attains performance on MMLU significantly beyond chance with close to 30% of positive rate. We also find non-random results on gsm8k (8%) and HotPotQA (8%)

To our knowledge, there is no model remotely close in this size range for evaluation comparison. Spiritually and practically, Monad remains unique.

## Use and deployment
Monad has been trained on the standard instruction style from Qwen.

```xml
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
<think>
```

Monad has no support yet for multi-turn.

A major envisioned use case for Monad is explainability, as the model does provide a unique trade-off between observability and actual reasoning performance.