# Model Card - Source: [https://arxiv.org/abs/2509.02046](https://arxiv.org/abs/2509.02046) - Optimizer: `muon` - Model size: `1.2b` - Data size: `193B` ## Best configuration | Hyperparameter | Value | |---|---| | beta1 | `0.8` | | beta2 | `0.98` | | decay | `1.0` | | epsilon | `1e-15` | | learning_rate | `0.004` | | lr_schedule | `linear` | | max_grad_norm | `2` | | min_lr_ratio | `0.0` | | momentum | `0.98` | | muon_epsilon | `1e-05` | | muon_to_adam_lr | `0.3` | | train_batch_size | `256` | | warmup | `0` | | weight_decay | `0.1` |