|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
model-index: |
|
|
- name: TBD-LLaMA-2B-Final-Direction-2B |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# TBD-LLaMA-2B-Final-Direction-2B |
|
|
|
|
|
This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset. |
|
|
It achieves the following results on the evaluation set: |
|
|
- Loss: 3.8900 |
|
|
|
|
|
## Model description |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 2e-05 |
|
|
- train_batch_size: 1 |
|
|
- eval_batch_size: 1 |
|
|
- seed: 42 |
|
|
- distributed_type: multi-GPU |
|
|
- num_devices: 4 |
|
|
- gradient_accumulation_steps: 16 |
|
|
- total_train_batch_size: 64 |
|
|
- total_eval_batch_size: 4 |
|
|
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: cosine |
|
|
- lr_scheduler_warmup_steps: 139 |
|
|
- training_steps: 13966 |
|
|
|
|
|
### Training results |
|
|
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|
|:-------------:|:------:|:-----:|:---------------:| |
|
|
| 8.9472 | 0.0143 | 200 | 8.9381 | |
|
|
| 6.7664 | 0.0286 | 400 | 6.7485 | |
|
|
| 6.6429 | 0.0430 | 600 | 6.6299 | |
|
|
| 6.5725 | 0.0573 | 800 | 6.5598 | |
|
|
| 6.4746 | 0.0716 | 1000 | 6.4666 | |
|
|
| 6.345 | 0.0859 | 1200 | 6.3290 | |
|
|
| 6.1452 | 0.1002 | 1400 | 6.1231 | |
|
|
| 5.9711 | 0.1146 | 1600 | 5.9283 | |
|
|
| 5.8076 | 0.1289 | 1800 | 5.7896 | |
|
|
| 5.718 | 0.1432 | 2000 | 5.6944 | |
|
|
| 5.6422 | 0.1575 | 2200 | 5.6219 | |
|
|
| 5.5956 | 0.1718 | 2400 | 5.5653 | |
|
|
| 5.5424 | 0.1862 | 2600 | 5.5163 | |
|
|
| 5.4527 | 0.2005 | 2800 | 5.4252 | |
|
|
| 4.7472 | 0.2148 | 3000 | 4.6523 | |
|
|
| 4.5528 | 0.2291 | 3200 | 4.4846 | |
|
|
| 4.503 | 0.2434 | 3400 | 4.3817 | |
|
|
| 4.427 | 0.2578 | 3600 | 4.3165 | |
|
|
| 4.4322 | 0.2721 | 3800 | 4.2725 | |
|
|
| 4.3265 | 0.2864 | 4000 | 4.2409 | |
|
|
| 4.3255 | 0.3007 | 4200 | 4.2157 | |
|
|
| 4.322 | 0.3150 | 4400 | 4.1930 | |
|
|
| 4.1982 | 0.3294 | 4600 | 4.1759 | |
|
|
| 4.2197 | 0.3437 | 4800 | 4.1609 | |
|
|
| 4.2109 | 0.3580 | 5000 | 4.1478 | |
|
|
| 4.1553 | 0.3723 | 5200 | 4.1329 | |
|
|
| 4.169 | 0.3866 | 5400 | 4.1215 | |
|
|
| 4.2068 | 0.4010 | 5600 | 4.1093 | |
|
|
| 4.182 | 0.4153 | 5800 | 4.0969 | |
|
|
| 4.2148 | 0.4296 | 6000 | 4.0841 | |
|
|
| 4.0511 | 0.4439 | 6200 | 4.0716 | |
|
|
| 4.0997 | 0.4582 | 6400 | 4.0592 | |
|
|
| 4.0322 | 0.4726 | 6600 | 4.0488 | |
|
|
| 3.9972 | 0.4869 | 6800 | 4.0372 | |
|
|
| 4.0335 | 0.5012 | 7000 | 4.0258 | |
|
|
| 4.0742 | 0.5155 | 7200 | 4.0168 | |
|
|
| 4.003 | 0.5298 | 7400 | 4.0082 | |
|
|
| 4.0007 | 0.5442 | 7600 | 3.9992 | |
|
|
| 4.1114 | 0.5585 | 7800 | 3.9898 | |
|
|
| 3.8742 | 0.5728 | 8000 | 3.9831 | |
|
|
| 4.0346 | 0.5871 | 8200 | 3.9765 | |
|
|
| 3.8871 | 0.6014 | 8400 | 3.9686 | |
|
|
| 3.9689 | 0.6158 | 8600 | 3.9626 | |
|
|
| 4.0003 | 0.6301 | 8800 | 3.9580 | |
|
|
| 4.0529 | 0.6444 | 9000 | 3.9496 | |
|
|
| 3.9973 | 0.6587 | 9200 | 3.9456 | |
|
|
| 4.0418 | 0.6730 | 9400 | 3.9409 | |
|
|
| 4.0237 | 0.6874 | 9600 | 3.9355 | |
|
|
| 3.9256 | 0.7017 | 9800 | 3.9299 | |
|
|
| 3.8549 | 0.7160 | 10000 | 3.9249 | |
|
|
| 3.9872 | 0.7303 | 10200 | 3.9215 | |
|
|
| 3.9918 | 0.7446 | 10400 | 3.9180 | |
|
|
| 4.0075 | 0.7590 | 10600 | 3.9137 | |
|
|
| 3.9235 | 0.7733 | 10800 | 3.9107 | |
|
|
| 3.9416 | 0.7876 | 11000 | 3.9069 | |
|
|
| 3.9939 | 0.8019 | 11200 | 3.9053 | |
|
|
| 4.0625 | 0.8162 | 11400 | 3.9030 | |
|
|
| 3.9773 | 0.8306 | 11600 | 3.9010 | |
|
|
| 3.8279 | 0.8449 | 11800 | 3.8990 | |
|
|
| 3.8631 | 0.8592 | 12000 | 3.8970 | |
|
|
| 3.8593 | 0.8735 | 12200 | 3.8953 | |
|
|
| 3.9531 | 0.8878 | 12400 | 3.8938 | |
|
|
| 3.8922 | 0.9022 | 12600 | 3.8927 | |
|
|
| 3.9151 | 0.9165 | 12800 | 3.8917 | |
|
|
| 3.9119 | 0.9308 | 13000 | 3.8910 | |
|
|
| 3.9261 | 0.9451 | 13200 | 3.8905 | |
|
|
| 3.9169 | 0.9594 | 13400 | 3.8903 | |
|
|
| 3.8439 | 0.9738 | 13600 | 3.8900 | |
|
|
| 3.8795 | 0.9881 | 13800 | 3.8900 | |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.56.1 |
|
|
- Pytorch 2.8.0a0+5228986c39.nv25.05 |
|
|
- Datasets 4.0.0 |
|
|
- Tokenizers 0.22.0 |
|
|
|