qing-yao
/

long-first-headfinal_seed-21_1e-3

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

long-first-headfinal_seed-21_1e-3 / README.md

qing-yao's picture

Model save

a354269 verified 10 months ago

|

history blame contribute delete

2.77 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: long_first_headfinal_seed-21_1e-3
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# long_first_headfinal_seed-21_1e-3

	This model was trained from scratch on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 5.0807
	- Accuracy: 0.2038

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 64
	- seed: 21
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 256
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 32000
	- num_epochs: 20.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|:--------:\|
	\| 6.1412 \| 0.9994 \| 1470 \| 5.5240 \| 0.1764 \|
	\| 4.5259 \| 1.9992 \| 2940 \| 5.4067 \| 0.1823 \|
	\| 3.8908 \| 2.9991 \| 4410 \| 5.3111 \| 0.1857 \|
	\| 3.7115 \| 3.9996 \| 5881 \| 5.2018 \| 0.1937 \|
	\| 3.4863 \| 4.9994 \| 7351 \| 5.1925 \| 0.1938 \|
	\| 3.4079 \| 5.9992 \| 8821 \| 5.1520 \| 0.1973 \|
	\| 3.3056 \| 6.9991 \| 10291 \| 5.1326 \| 0.1999 \|
	\| 3.258 \| 7.9996 \| 11762 \| 5.1119 \| 0.1997 \|
	\| 3.2065 \| 8.9994 \| 13232 \| 5.1225 \| 0.2009 \|
	\| 3.1699 \| 9.9992 \| 14702 \| 5.1300 \| 0.1987 \|
	\| 3.1451 \| 10.9991 \| 16172 \| 5.0815 \| 0.2020 \|
	\| 3.1079 \| 11.9996 \| 17643 \| 5.1214 \| 0.2012 \|
	\| 3.1043 \| 12.9994 \| 19113 \| 5.0818 \| 0.2012 \|
	\| 3.0668 \| 13.9992 \| 20583 \| 5.1290 \| 0.2022 \|
	\| 3.0777 \| 14.9991 \| 22053 \| 5.1106 \| 0.1996 \|
	\| 3.039 \| 15.9996 \| 23524 \| 5.1058 \| 0.2006 \|
	\| 3.0432 \| 16.9994 \| 24994 \| 5.1083 \| 0.2036 \|
	\| 3.0188 \| 17.9992 \| 26464 \| 5.1309 \| 0.2016 \|
	\| 3.0246 \| 18.9991 \| 27934 \| 5.1190 \| 0.1996 \|
	\| 3.0115 \| 19.9962 \| 29400 \| 5.0807 \| 0.2038 \|


	### Framework versions

	- Transformers 4.46.2
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.20.0