minpeter
/

tiny-ko-187m-sft-250718

@@ -6,6 +6,13 @@ tags:
 - generated_from_trainer
 datasets:
 - HuggingFaceTB/smol-smoltalk
 model-index:
 - name: tiny-ko-187m-sft-250718
   results: []
@@ -17,7 +24,7 @@ should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 <details><summary>See axolotl config</summary>
-axolotl version: `0.11.0`
 ```yaml
 base_model: minpeter/tiny-ko-187m-base-250718
@@ -41,65 +48,65 @@ datasets:
       role: role
       content: content
-  # - path: trillionlabs/multisystem-curated
-  #   type: chat_template
-  #   split: train
-  #   field_messages: messages
-  #   message_property_mappings:
-  #     role: role
-  #     content: content
-  # - path: allenai/tulu-3-sft-personas-instruction-following
-  #   type: chat_template
-  #   split: train
-  #   field_messages: messages
-  #   message_property_mappings:
-  #     role: role
-  #     content: content
-  # - path: lemon-mint/smol-koreantalk
-  #   type: chat_template
-  #   split: train
-  #   field_messages: messages
-  #   message_property_mappings:
-  #     role: role
-  #     content: content
-  # - path: lemon-mint/Korean-FineTome-100k
-  #   type: chat_template
-  #   split: train
-  #   field_messages: messages
-  #   message_property_mappings:
-  #     role: role
-  #     content: content
-  # - path: heegyu/open-korean-instructions-v20231020
-  #   type: chat_template
-  #   split: train
-  #   field_messages: conversations
-  #   message_property_mappings:
-  #     role: from
-  #     content: value
-  #   roles:
-  #     user: ["human", "user"]
-  #     assistant: ["gpt", "assistant", "bot"]
-  #     system: ["system", "input"]
-  # - path: coastral/korean-writing-style-instruct
-  #   type: chat_template
-  #   split: train
-  #   field_messages: conversations
-  #   message_property_mappings:
-  #     role: from
-  #     content: value
-  # - path: devngho/korean-instruction-mix
-  #   type: chat_template
-  #   split: train
-  #   field_messages: messages
-  #   message_property_mappings:
-  #     role: from
-  #     content: value
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.001
@@ -115,7 +122,7 @@ overrides_of_model_config:
   max_position_embeddings: 65536
 gradient_accumulation_steps: 8
-micro_batch_size: 8
 num_epochs: 1
 optimizer: muon
 lr_scheduler: cosine
@@ -150,15 +157,38 @@ fsdp_config:
 special_tokens:
   eos_token: '<|im_end|>'
 ```
 </details><br>
 # tiny-ko-187m-sft-250718
-This model is a fine-tuned version of [minpeter/tiny-ko-187m-base-250718](https://huggingface.co/minpeter/tiny-ko-187m-base-250718) on the HuggingFaceTB/smol-smoltalk dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.3080
 ## Model description
@@ -178,61 +208,46 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0003
-- train_batch_size: 8
-- eval_batch_size: 8
 - seed: 42
 - gradient_accumulation_steps: 8
-- total_train_batch_size: 64
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 20
-- training_steps: 7132
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| No log        | 0      | 0    | 1.7934          |
-| 1.5103        | 0.0280 | 200  | 1.5093          |
-| 1.535         | 0.0561 | 400  | 1.4619          |
-| 1.3733        | 0.0841 | 600  | 1.4341          |
-| 1.4049        | 0.1122 | 800  | 1.4144          |
-| 1.4024        | 0.1402 | 1000 | 1.3985          |
-| 1.4178        | 0.1683 | 1200 | 1.3871          |
-| 1.4241        | 0.1963 | 1400 | 1.3760          |
-| 1.269         | 0.2244 | 1600 | 1.3670          |
-| 1.2487        | 0.2524 | 1800 | 1.3579          |
-| 1.405         | 0.2805 | 2000 | 1.3513          |
-| 1.2827        | 0.3085 | 2200 | 1.3451          |
-| 1.3087        | 0.3365 | 2400 | 1.3406          |
-| 1.3084        | 0.3646 | 2600 | 1.3350          |
-| 1.244         | 0.3926 | 2800 | 1.3305          |
-| 1.2985        | 0.4207 | 3000 | 1.3264          |
-| 1.3378        | 0.4487 | 3200 | 1.3228          |
-| 1.3716        | 0.4768 | 3400 | 1.3200          |
-| 1.3085        | 0.5048 | 3600 | 1.3179          |
-| 1.3579        | 0.5329 | 3800 | 1.3159          |
-| 1.387         | 0.5609 | 4000 | 1.3144          |
-| 1.183         | 0.5889 | 4200 | 1.3127          |
-| 1.2682        | 0.6170 | 4400 | 1.3117          |
-| 1.247         | 0.6450 | 4600 | 1.3107          |
-| 1.3593        | 0.6731 | 4800 | 1.3098          |
-| 1.3385        | 0.7011 | 5000 | 1.3090          |
-| 1.3976        | 0.7292 | 5200 | 1.3087          |
-| 1.3112        | 0.7572 | 5400 | 1.3084          |
-| 1.3974        | 0.7853 | 5600 | 1.3083          |
-| 1.2335        | 0.8133 | 5800 | 1.3081          |
-| 1.2968        | 0.8414 | 6000 | 1.3084          |
-| 1.2242        | 0.8694 | 6200 | 1.3082          |
-| 1.269         | 0.8974 | 6400 | 1.3080          |
-| 1.3043        | 0.9255 | 6600 | 1.3079          |
-| 1.3216        | 0.9535 | 6800 | 1.3081          |
-| 1.4301        | 0.9816 | 7000 | 1.3080          |
 ### Framework versions
-- Transformers 4.53.1
 - Pytorch 2.7.1+cu126
-- Datasets 3.6.0
 - Tokenizers 0.21.2

 - generated_from_trainer
 datasets:
 - HuggingFaceTB/smol-smoltalk
+- trillionlabs/multisystem-curated
+- allenai/tulu-3-sft-personas-instruction-following
+- lemon-mint/smol-koreantalk
+- lemon-mint/Korean-FineTome-100k
+- heegyu/open-korean-instructions-v20231020
+- coastral/korean-writing-style-instruct
+- devngho/korean-instruction-mix
 model-index:
 - name: tiny-ko-187m-sft-250718
   results: []
 [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 <details><summary>See axolotl config</summary>
+axolotl version: `0.12.0.dev0`
 ```yaml
 base_model: minpeter/tiny-ko-187m-base-250718
       role: role
       content: content
+  - path: trillionlabs/multisystem-curated
+    type: chat_template
+    split: train
+    field_messages: messages
+    message_property_mappings:
+      role: role
+      content: content
+  - path: allenai/tulu-3-sft-personas-instruction-following
+    type: chat_template
+    split: train
+    field_messages: messages
+    message_property_mappings:
+      role: role
+      content: content
+  - path: lemon-mint/smol-koreantalk
+    type: chat_template
+    split: train
+    field_messages: messages
+    message_property_mappings:
+      role: role
+      content: content
+  - path: lemon-mint/Korean-FineTome-100k
+    type: chat_template
+    split: train
+    field_messages: messages
+    message_property_mappings:
+      role: role
+      content: content
+  - path: heegyu/open-korean-instructions-v20231020
+    type: chat_template
+    split: train
+    field_messages: conversations
+    message_property_mappings:
+      role: from
+      content: value
+    roles:
+      user: ["human", "user"]
+      assistant: ["gpt", "assistant", "bot"]
+      system: ["system", "input"]
+  - path: coastral/korean-writing-style-instruct
+    type: chat_template
+    split: train
+    field_messages: conversations
+    message_property_mappings:
+      role: from
+      content: value
+  - path: devngho/korean-instruction-mix
+    type: chat_template
+    split: train
+    field_messages: messages
+    message_property_mappings:
+      role: from
+      content: value
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.001
   max_position_embeddings: 65536
 gradient_accumulation_steps: 8
+micro_batch_size: 16
 num_epochs: 1
 optimizer: muon
 lr_scheduler: cosine
 special_tokens:
   eos_token: '<|im_end|>'
+plugins:
+  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
+  - axolotl.integrations.liger.LigerPlugin
+  - axolotl.integrations.lm_eval.LMEvalPlugin
+lm_eval_tasks:
+  - gsm8k
+  - hellaswag
+  - arc_easy
+  - arc_challenge
+  - piqa
+  - winogrande
+  - openbookqa
+  - wsc
+  - boolq
+liger_rope: true
+liger_rms_norm: true
+liger_glu_activation: true
+liger_layer_norm: true
+liger_fused_linear_cross_entropy: true
 ```
 </details><br>
 # tiny-ko-187m-sft-250718
+This model is a fine-tuned version of [minpeter/tiny-ko-187m-base-250718](https://huggingface.co/minpeter/tiny-ko-187m-base-250718) on the HuggingFaceTB/smol-smoltalk, the trillionlabs/multisystem-curated, the allenai/tulu-3-sft-personas-instruction-following, the lemon-mint/smol-koreantalk, the lemon-mint/Korean-FineTome-100k, the heegyu/open-korean-instructions-v20231020, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets.
 It achieves the following results on the evaluation set:
+- Loss: 1.6990
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0003
+- train_batch_size: 16
+- eval_batch_size: 16
 - seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
 - gradient_accumulation_steps: 8
+- total_train_batch_size: 512
+- total_eval_batch_size: 64
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 20
+- training_steps: 3470
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| No log        | 0      | 0    | 2.1799          |
+| 1.8649        | 0.0576 | 200  | 1.8603          |
+| 1.8031        | 0.1153 | 400  | 1.8033          |
+| 1.7128        | 0.1729 | 600  | 1.7709          |
+| 1.7758        | 0.2306 | 800  | 1.7492          |
+| 1.7084        | 0.2882 | 1000 | 1.7339          |
+| 1.7258        | 0.3458 | 1200 | 1.7225          |
+| 1.6972        | 0.4035 | 1400 | 1.7149          |
+| 1.73          | 0.4611 | 1600 | 1.7091          |
+| 1.7166        | 0.5188 | 1800 | 1.7051          |
+| 1.688         | 0.5764 | 2000 | 1.7025          |
+| 1.737         | 0.6341 | 2200 | 1.7010          |
+| 1.7322        | 0.6917 | 2400 | 1.6998          |
+| 1.7133        | 0.7493 | 2600 | 1.6994          |
+| 1.6953        | 0.8070 | 2800 | 1.6992          |
+| 1.7233        | 0.8646 | 3000 | 1.6990          |
+| 1.733         | 0.9223 | 3200 | 1.6990          |
+| 1.7017        | 0.9799 | 3400 | 1.6990          |
 ### Framework versions
+- Transformers 4.53.2
 - Pytorch 2.7.1+cu126
+- Datasets 4.0.0
 - Tokenizers 0.21.2