muon_1.2b_2 / README.md
KaiyueWen's picture
Upload folder using huggingface_hub
f5e9fe6 verified

Model Card

Best configuration

Hyperparameter Value
beta1 0.8
beta2 0.98
decay 1.0
epsilon 1e-15
learning_rate 0.004
lr_schedule linear
max_grad_norm 2
min_lr_ratio 0.0
momentum 0.98
muon_epsilon 1e-05
muon_to_adam_lr 0.3
train_batch_size 256
warmup 0
weight_decay 0.1