File size: 509 Bytes
6691fe6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Model Card

- Source: [https://arxiv.org/abs/2509.02046](https://arxiv.org/abs/2509.02046)
- Optimizer: `soape`
- Model size: `1.2b`
- Data size: `24B`

## Best configuration

| Hyperparameter | Value |
|---|---|
| beta1 | `0.95` |
| beta2 | `0.99` |
| block_size | `512` |
| epsilon | `1e-10` |
| learning_rate | `0.004` |
| max_grad_norm | `1` |
| min_lr_ratio | `0.0` |
| precondition_frequency | `10` |
| shampoo_beta | `0.9` |
| train_batch_size | `256` |
| warmup | `1000` |
| weight_decay | `0.1` |