strict_balanced_cf_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.0053	0.9998	1486	4.4138	0.2933
4.3053	1.9997	2972	3.9054	0.3326
3.6816	2.9997	4458	3.6296	0.3564
3.4953	3.9996	5944	3.4676	0.3720
3.2624	4.9995	7430	3.3735	0.3810
3.1875	5.9994	8916	3.3137	0.3860
3.082	6.9993	10402	3.2722	0.3901
3.043	7.9999	11889	3.2455	0.3928
2.9819	8.9998	13375	3.2288	0.3944
2.9557	9.9997	14861	3.2191	0.3961
2.9199	10.9997	16347	3.2084	0.3972
2.8984	11.9996	17833	3.2034	0.3979
2.8797	12.9995	19319	3.1950	0.3989
2.8596	13.9994	20805	3.1958	0.3993
2.8504	14.9993	22291	3.1948	0.3993
2.8317	15.9999	23778	3.1944	0.3999
2.8324	16.9998	25264	3.1884	0.4004
2.8107	17.9997	26750	3.1948	0.4006
2.8208	18.9997	28236	3.1848	0.4008
2.799	19.9982	29720	3.1879	0.4010

Safetensors

Model size

0.1B params

Tensor type

F32