Update README.md
Browse files
README.md
CHANGED
|
@@ -11,10 +11,19 @@ base_model:
|
|
| 11 |
|
| 12 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 13 |
|
| 14 |
-
A debug model fine-tuned on
|
| 15 |
|
| 16 |
-
Created with this training command from [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl) (commit hash: `
|
| 17 |
|
| 18 |
```bash
|
| 19 |
-
uv run
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 13 |
|
| 14 |
+
A debug model fine-tuned on `willcb/R1-reverse-wikipedia-paragraphs-v1-1000`. To be used as warmed up model to RL in `vf-reverse-text`.
|
| 15 |
|
| 16 |
+
Created with this training command from [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl) (commit hash: `8262560`)
|
| 17 |
|
| 18 |
```bash
|
| 19 |
+
uv run torchrun --nproc-per-node 8 src/prime_rl/trainer/sft/train.py \
|
| 20 |
+
--model.name PrimeIntellect/Qwen3-0.6B \
|
| 21 |
+
--data.name willcb/R1-reverse-wikipedia-paragraphs-v1-1000 \
|
| 22 |
+
--max-steps 100 \
|
| 23 |
+
--data.batch-size 16 \
|
| 24 |
+
--data.micro-batch-size 1 \
|
| 25 |
+
--data.seq-len 4096 \
|
| 26 |
+
--optim.lr 2e-5
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
Check the run out on [W&B](https://wandb.ai/primeintellect/mika/runs/odsfiekx?nw=nwusermikasenghaas_).
|