soape_130m_1 / README.md
KaiyueWen's picture
Upload folder using huggingface_hub
9308123 verified

Model Card

Best configuration

Hyperparameter Value
beta1 0.95
beta2 0.98
block_size 256
epsilon 1e-15
learning_rate 0.016
max_grad_norm 1.0
min_lr_ratio 0
partition_grads_into_blocks True
precondition_frequency 1
shampoo_beta 0.95
train_batch_size 128
warmup 1000
weight_decay 0.1