Update README.md
Browse files
README.md
CHANGED
|
@@ -5,13 +5,13 @@ license: apache-2.0
|
|
| 5 |
<center> <div style="text-align: center;"> <img src="https://raw.githubusercontent.com/ZHZisZZ/dllm/main/assets/logo.gif" width="400" />
|
| 6 |
</div> </center>
|
| 7 |
|
| 8 |
-
# ModernBERT-large-chat-v0
|
| 9 |
|
| 10 |
-
ModernBERT-large-chat-v0 is a diffusion-based generative variant of [ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large), finetuned using the [dLLM](https://github.com/ZHZisZZ/dllm) framework.
|
| 11 |
|
| 12 |
## Model Overview
|
| 13 |
|
| 14 |
-
ModernBERT-large-chat-v0 has the following features:
|
| 15 |
|
| 16 |
- **Method**: [Masked Diffusion Language Modeling (MDLM)](https://arxiv.org/abs/2406.07524)
|
| 17 |
- **Framework**: [dLLM](https://github.com/ZHZisZZ/dllm)
|
|
@@ -110,8 +110,8 @@ def generate(model, prompt, steps=128, gen_length=128, block_length=64, temperat
|
|
| 110 |
|
| 111 |
|
| 112 |
device = 'cuda'
|
| 113 |
-
model = AutoModelForMaskedLM.from_pretrained('dllm-collection/ModernBERT-large-chat-v0', dtype=torch.bfloat16).to(device).eval()
|
| 114 |
-
tokenizer = AutoTokenizer.from_pretrained('dllm-collection/ModernBERT-large-chat-v0')
|
| 115 |
|
| 116 |
prompt = "Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?"
|
| 117 |
m = [
|
|
@@ -144,20 +144,20 @@ Follow the Github repo's demo script [examples/bert/chat.py](https://github.com/
|
|
| 144 |
|
| 145 |
```shell
|
| 146 |
python -u examples/bert/chat.py \
|
| 147 |
-
--model_name_or_path dllm-collection/ModernBERT-large-chat-v0 \
|
| 148 |
--chat True
|
| 149 |
```
|
| 150 |
|
| 151 |
## Evaluation
|
| 152 |
|βββββββββββββββββββββ| LAMBADA | GSM8K | CEval | BBH | MATH | MMLU | Winogrande | HellaSwag | CMMLU |
|
| 153 |
|:------------------------------------|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
|
| 154 |
-
| ModernBERT-base-chat-v0 | 49.3 | 5.9 | 25.0 | 17.9 | 3.1 | 26.1 | 49.7 | 41.0 | 24.3 |
|
| 155 |
-
| ModernBERT-large-chat-v0 | 46.3 | 17.1 | 24.6 | 25.1 | 3.8 | 33.5 | 53.1 | 45.0 | 27.5 |
|
| 156 |
|
| 157 |
<!-- <p align="left" style="color: #808080; font-size: 0.9em;">
|
| 158 |
Table 1. Evaluation results of
|
| 159 |
-
ModernBERT-base-chat-v0 and
|
| 160 |
-
ModernBERT-large-chat-v0.
|
| 161 |
All results are evaluated using
|
| 162 |
<a href="https://github.com/ZHZisZZ/dllm/tree/main" style="color: #808080; text-decoration: underline;">
|
| 163 |
dLLM
|
|
@@ -167,16 +167,16 @@ python -u examples/bert/chat.py \
|
|
| 167 |
</a>.
|
| 168 |
</p> -->
|
| 169 |
|
| 170 |
-
To automatically evaluate ModernBERT-large-chat-v0 on all benchmarks, run:
|
| 171 |
```shell
|
| 172 |
bash examples/bert/eval.sh \
|
| 173 |
-
--model_name_or_path "dllm-collection/ModernBERT-large-chat-v0"
|
| 174 |
```
|
| 175 |
|
| 176 |
|
| 177 |
## Citation
|
| 178 |
|
| 179 |
-
If you use ModernBERT-large-chat-v0 or dLLM, please cite:
|
| 180 |
|
| 181 |
```bibtex
|
| 182 |
@misc{dllm,
|
|
|
|
| 5 |
<center> <div style="text-align: center;"> <img src="https://raw.githubusercontent.com/ZHZisZZ/dllm/main/assets/logo.gif" width="400" />
|
| 6 |
</div> </center>
|
| 7 |
|
| 8 |
+
# ModernBERT-large-chat-v0.1
|
| 9 |
|
| 10 |
+
ModernBERT-large-chat-v0.1 is a diffusion-based generative variant of [ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large), finetuned using the [dLLM](https://github.com/ZHZisZZ/dllm) framework.
|
| 11 |
|
| 12 |
## Model Overview
|
| 13 |
|
| 14 |
+
ModernBERT-large-chat-v0.1 has the following features:
|
| 15 |
|
| 16 |
- **Method**: [Masked Diffusion Language Modeling (MDLM)](https://arxiv.org/abs/2406.07524)
|
| 17 |
- **Framework**: [dLLM](https://github.com/ZHZisZZ/dllm)
|
|
|
|
| 110 |
|
| 111 |
|
| 112 |
device = 'cuda'
|
| 113 |
+
model = AutoModelForMaskedLM.from_pretrained('dllm-collection/ModernBERT-large-chat-v0.1', dtype=torch.bfloat16).to(device).eval()
|
| 114 |
+
tokenizer = AutoTokenizer.from_pretrained('dllm-collection/ModernBERT-large-chat-v0.1')
|
| 115 |
|
| 116 |
prompt = "Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?"
|
| 117 |
m = [
|
|
|
|
| 144 |
|
| 145 |
```shell
|
| 146 |
python -u examples/bert/chat.py \
|
| 147 |
+
--model_name_or_path dllm-collection/ModernBERT-large-chat-v0.1 \
|
| 148 |
--chat True
|
| 149 |
```
|
| 150 |
|
| 151 |
## Evaluation
|
| 152 |
|βββββββββββββββββββββ| LAMBADA | GSM8K | CEval | BBH | MATH | MMLU | Winogrande | HellaSwag | CMMLU |
|
| 153 |
|:------------------------------------|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
|
| 154 |
+
| ModernBERT-base-chat-v0.1 | 49.3 | 5.9 | 25.0 | 17.9 | 3.1 | 26.1 | 49.7 | 41.0 | 24.3 |
|
| 155 |
+
| ModernBERT-large-chat-v0.1 | 46.3 | 17.1 | 24.6 | 25.1 | 3.8 | 33.5 | 53.1 | 45.0 | 27.5 |
|
| 156 |
|
| 157 |
<!-- <p align="left" style="color: #808080; font-size: 0.9em;">
|
| 158 |
Table 1. Evaluation results of
|
| 159 |
+
ModernBERT-base-chat-v0.1 and
|
| 160 |
+
ModernBERT-large-chat-v0.1.
|
| 161 |
All results are evaluated using
|
| 162 |
<a href="https://github.com/ZHZisZZ/dllm/tree/main" style="color: #808080; text-decoration: underline;">
|
| 163 |
dLLM
|
|
|
|
| 167 |
</a>.
|
| 168 |
</p> -->
|
| 169 |
|
| 170 |
+
To automatically evaluate ModernBERT-large-chat-v0.1 on all benchmarks, run:
|
| 171 |
```shell
|
| 172 |
bash examples/bert/eval.sh \
|
| 173 |
+
--model_name_or_path "dllm-collection/ModernBERT-large-chat-v0.1"
|
| 174 |
```
|
| 175 |
|
| 176 |
|
| 177 |
## Citation
|
| 178 |
|
| 179 |
+
If you use ModernBERT-large-chat-v0.1 or dLLM, please cite:
|
| 180 |
|
| 181 |
```bibtex
|
| 182 |
@misc{dllm,
|