Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

model.safetensors +1 -1
training_artifacts/README.md +1 -1
training_artifacts/logs/pipeline_cleaned.txt +1144 -76

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9c7da428ba619846e7cc3592a11c91c37746d2151a55f01f294fb8e5b860b969
 size 988097824

 version https://git-lfs.github.com/spec/v1
+oid sha256:09c851117fac482c71cf54466a6c7f4d8c68bfadc3b913986477652224411dc9
 size 988097824

training_artifacts/README.md CHANGED Viewed

@@ -12,5 +12,5 @@ This directory contains the training configuration and logs for this model.
 ## Job Information
 - Job Name: lf_torch_test__interactive
-- Timestamp: 2025-10-23 00:38:09 UTC
 - Execution Mode: Local

 ## Job Information
 - Job Name: lf_torch_test__interactive
+- Timestamp: 2025-10-23 00:42:45 UTC
 - Execution Mode: Local

training_artifacts/logs/pipeline_cleaned.txt CHANGED Viewed

@@ -11061,7 +11061,18 @@ Setting OMP_NUM_THREADS environment variable for each process to be 1 in default
   warnings.warn(
 /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
   warnings.warn(
-===========
 STAGE 1: Training Model
 Start Time: Wed Oct 22 08:37:00 PM EDT 2025
 ========================================
@@ -11283,72 +11294,90 @@ gl064:2627273:2627302 [1] NCCL INFO Using network IB
 gl064:2627272:2627301 [0] NCCL INFO Using network IB
 gl064:2627272:2627301 [0] NCCL INFO ncclCommInitRankConfig comm 0x15210000 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xb71ac44899f1b45 - Init START
 gl064:2627273:2627302 [1] NCCL INFO ncclCommInitRankConfig comm 0x138c8d70 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb71ac44899f1b45 - Init START
-gl064:2627273:2627302 [1] NCCL INFO RAS client listening socket at ::1<28028>
-gl064:2627272:2627301 [0] NCCL INFO RAS client listening socket at ::1<28028>
-gl064:2627272:2627301 [0] NCCL INFO Bootstrap timings total 4.455940 (create 0.000026, send 0.000090, recv 0.000397, ring 0.000373, delay 0.000000)
-gl064:2627273:2627302 [1] NCCL INFO Bootstrap timings total 4.456660 (create 0.000023, send 0.000087, recv 4.452725, ring 0.003056, delay 0.000000)
-gl064:2627272:2627301 [0] NCCL INFO Setting affinity for GPU 0 to 0-15
-gl064:2627273:2627302 [1] NCCL INFO Setting affinity for GPU 1 to 0-15
-gl064:2627272:2627301 [0] NCCL INFO comm 0x15210000 rank 0 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0
-gl064:2627273:2627302 [1] NCCL INFO comm 0x138c8d70 rank 1 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0
-gl064:2627272:2627301 [0] NCCL INFO Channel 00/02 : 0 1 2 3
-gl064:2627272:2627301 [0] NCCL INFO Channel 01/02 : 0 1 2 3
-gl064:2627273:2627302 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
-gl064:2627272:2627301 [0] NCCL INFO Trees [0] 1/2/-1->0->-1 [1] 1/-1/-1->0->2
-gl064:2627273:2627302 [1] NCCL INFO P2P Chunksize set to 131072
-gl064:2627272:2627301 [0] NCCL INFO P2P Chunksize set to 131072
-gl064:2627273:2627302 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
-gl064:2627273:2627308 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 8
-gl064:2627273:2627307 [1] NCCL INFO [Proxy Service] Device 1 CPU core 5
-gl064:2627272:2627301 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
-gl064:2627272:2627301 [0] NCCL INFO Check P2P Type isAllDirectP2p 0 directMode 0
-gl064:2627273:2627302 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
-gl064:2627273:2627302 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
-gl064:2627272:2627310 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 15
-gl064:2627272:2627309 [0] NCCL INFO [Proxy Service] Device 0 CPU core 14
-gl064:2627272:2627301 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
-gl064:2627272:2627301 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
-gl064:2627272:2627301 [0] NCCL INFO CC Off, workFifoBytes 1048576
-gl064:2627273:2627302 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
-gl064:2627273:2627302 [1] NCCL INFO ncclCommInitRankConfig comm 0x138c8d70 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb71ac44899f1b45 - Init COMPLETE
-gl064:2627272:2627301 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
-gl064:2627273:2627302 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 4 total 4.59 (kernels 0.08, alloc 0.02, bootstrap 4.46, allgathers 0.01, topo 0.02, graphs 0.00, connections 0.00, rest 0.00)
-gl064:2627272:2627301 [0] NCCL INFO ncclCommInitRankConfig comm 0x15210000 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xb71ac44899f1b45 - Init COMPLETE
-gl064:2627272:2627301 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 4 total 4.60 (kernels 0.09, alloc 0.02, bootstrap 4.46, allgathers 0.01, topo 0.02, graphs 0.00, connections 0.00, rest 0.00)
-gl064:2627272:2627312 [0] NCCL INFO Channel 00/0 : 3[1] -> 0[0] [receive] via NET/IB/0
-gl064:2627272:2627312 [0] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [receive] via NET/IB/0
-gl064:2627272:2627313 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 11
-gl064:2627273:2627311 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[0] [send] via NET/IB/0
-gl064:2627273:2627311 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [send] via NET/IB/0
-gl064:2627273:2627314 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 6
-gl064:2627272:2627312 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct
-gl064:2627272:2627312 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct
-gl064:2627272:2627312 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
-gl064:2627273:2627311 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
-[INFO|trainer.py:2519] 2025-10-22 20:37:17,497 >> ***** Running training *****
-[INFO|trainer.py:2520] 2025-10-22 20:37:17,497 >>   Num examples = 3,598
-[INFO|trainer.py:2521] 2025-10-22 20:37:17,497 >>   Num Epochs = 1
-[INFO|trainer.py:2522] 2025-10-22 20:37:17,498 >>   Instantaneous batch size per device = 1
-[INFO|trainer.py:2525] 2025-10-22 20:37:17,498 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
-[INFO|trainer.py:2526] 2025-10-22 20:37:17,498 >>   Gradient Accumulation steps = 1
-[INFO|trainer.py:2527] 2025-10-22 20:37:17,498 >>   Total optimization steps = 100
-[INFO|trainer.py:2528] 2025-10-22 20:37:17,499 >>   Number of trainable parameters = 4,399,104
-[INFO|integration_utils.py:867] 2025-10-22 20:37:17,501 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
-wandb: Currently logged in as: zsprague (ut_nlp_deduce) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
-wandb: Tracking run with wandb version 0.22.2
-wandb: Run data is saved locally in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251022_203717-18s2z8v7
-wandb: Run `wandb offline` to turn off syncing.
-wandb: Syncing run interactive_test
-wandb:  View project at https://wandb.ai/ut_nlp_deduce/llamafactory
-wandb:  View run at https://wandb.ai/ut_nlp_deduce/llamafactory/runs/18s2z8v7
-  0%|          | 0/100 [00:00<?, ?it/s]  1%|          | 1/100 [00:00<00:57,  1.71it/s]  2%|         | 2/100 [00:00<00:40,  2.43it/s]  3%|         | 3/100 [00:01<00:40,  2.39it/s]  4%|         | 4/100 [00:01<00:34,  2.81it/s]  5%|         | 5/100 [00:02<00:41,  2.28it/s]  6%|         | 6/100 [00:02<00:35,  2.68it/s]  7%|         | 7/100 [00:02<00:31,  2.93it/s]  8%|         | 8/100 [00:02<00:29,  3.09it/s]  9%|         | 9/100 [00:03<00:28,  3.25it/s] 10%|         | 10/100 [00:03<00:29,  3.02it/s]                                                {'loss': 1.286, 'grad_norm': 0.36361026763916016, 'learning_rate': 4.55e-05, 'epoch': 0.01}
- 10%|         | 10/100 [00:03<00:29,  3.02it/s] 11%|         | 11/100 [00:04<00:35,  2.49it/s] 12%|        | 12/100 [00:04<00:34,  2.58it/s] 13%|        | 13/100 [00:04<00:30,  2.84it/s] 14%|        | 14/100 [00:05<00:29,  2.92it/s] 15%|        | 15/100 [00:05<00:37,  2.26it/s] 16%|        | 16/100 [00:06<00:32,  2.60it/s] 17%|        | 17/100 [00:06<00:30,  2.69it/s] 18%|        | 18/100 [00:06<00:30,  2.71it/s] 19%|        | 19/100 [00:07<00:29,  2.70it/s] 20%|        | 20/100 [00:07<00:27,  2.86it/s]                                                {'loss': 1.1751, 'grad_norm': 0.38957393169403076, 'learning_rate': 4.05e-05, 'epoch': 0.02}
- 20%|        | 20/100 [00:07<00:27,  2.86it/s] 21%|        | 21/100 [00:07<00:32,  2.40it/s] 22%|       | 22/100 [00:08<00:30,  2.53it/s] 23%|       | 23/100 [00:08<00:29,  2.57it/s] 24%|       | 24/100 [00:08<00:27,  2.81it/s] 25%|       | 25/100 [00:09<00:27,  2.75it/s] 26%|       | 26/100 [00:09<00:24,  2.97it/s] 27%|       | 27/100 [00:09<00:23,  3.14it/s] 28%|       | 28/100 [00:10<00:22,  3.16it/s] 29%|       | 29/100 [00:10<00:21,  3.23it/s] 30%|       | 30/100 [00:10<00:21,  3.26it/s]                                                {'loss': 1.1373, 'grad_norm': 0.42558616399765015, 'learning_rate': 3.55e-05, 'epoch': 0.03}
- 30%|       | 30/100 [00:10<00:21,  3.26it/s] 31%|       | 31/100 [00:11<00:22,  3.04it/s] 32%|      | 32/100 [00:11<00:21,  3.12it/s] 33%|      | 33/100 [00:11<00:20,  3.25it/s] 34%|      | 34/100 [00:12<00:20,  3.15it/s] 35%|      | 35/100 [00:12<00:21,  3.08it/s] 36%|      | 36/100 [00:12<00:20,  3.18it/s] 37%|      | 37/100 [00:12<00:17,  3.52it/s] 38%|      | 38/100 [00:13<00:17,  3.55it/s] 39%|      | 39/100 [00:13<00:15,  3.81it/s] 40%|      | 40/100 [00:13<00:18,  3.22it/s]                                                {'loss': 1.0636, 'grad_norm': 0.4293089807033539, 'learning_rate': 3.05e-05, 'epoch': 0.04}
- 40%|      | 40/100 [00:13<00:18,  3.22it/s] 41%|      | 41/100 [00:14<00:17,  3.30it/s] 42%|     | 42/100 [00:14<00:17,  3.40it/s] 43%|     | 43/100 [00:14<00:17,  3.19it/s] 44%|     | 44/100 [00:15<00:16,  3.32it/s] 45%|     | 45/100 [00:15<00:16,  3.34it/s] 46%|     | 46/100 [00:15<00:17,  3.15it/s] 47%|     | 47/100 [00:15<00:15,  3.39it/s] 48%|     | 48/100 [00:16<00:15,  3.36it/s] 49%|     | 49/100 [00:16<00:15,  3.28it/s] 50%|     | 50/100 [00:16<00:15,  3.25it/s]                                                {'loss': 1.0329, 'grad_norm': 0.4313737154006958, 'learning_rate': 2.5500000000000003e-05, 'epoch': 0.06}
- 50%|     | 50/100 [00:16<00:15,  3.25it/s][INFO|trainer.py:4309] 2025-10-22 20:37:35,492 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50
-[INFO|configuration_utils.py:765] 2025-10-22 20:37:35,643 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
-[INFO|configuration_utils.py:839] 2025-10-22 20:37:35,644 >> Model config Qwen2Config {
   "architectures": [
     "Qwen2ForCausalLM"
   ],
@@ -11404,14 +11433,119 @@ wandb:  View run at https://wandb.ai/ut_nlp_deduce/llamafactory/runs/18s2z8v7
   "vocab_size": 151936
 }
-[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:37:35,783 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/chat_template.jinja
-[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:37:35,789 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/tokenizer_config.json
-[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:37:35,809 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/special_tokens_map.json
- 51%|     | 51/100 [00:18<00:27,  1.81it/s] 52%|    | 52/100 [00:18<00:23,  2.07it/s] 53%|    | 53/100 [00:18<00:22,  2.13it/s] 54%|    | 54/100 [00:19<00:20,  2.26it/s] 55%|    | 55/100 [00:19<00:18,  2.41it/s] 56%|    | 56/100 [00:19<00:16,  2.62it/s] 57%|    | 57/100 [00:20<00:15,  2.79it/s] 58%|    | 58/100 [00:20<00:14,  2.83it/s] 59%|    | 59/100 [00:20<00:13,  2.94it/s] 60%|    | 60/100 [00:21<00:13,  3.04it/s]                                                {'loss': 0.9981, 'grad_norm': 0.4507186710834503, 'learning_rate': 2.05e-05, 'epoch': 0.07}
- 60%|    | 60/100 [00:21<00:13,  3.04it/s] 61%|    | 61/100 [00:21<00:12,  3.09it/s] 62%|   | 62/100 [00:21<00:12,  3.16it/s] 63%|   | 63/100 [00:21<00:11,  3.30it/s] 64%|   | 64/100 [00:22<00:10,  3.45it/s] 65%|   | 65/100 [00:22<00:10,  3.39it/s] 66%|   | 66/100 [00:22<00:10,  3.17it/s] 67%|   | 67/100 [00:23<00:09,  3.35it/s] 68%|   | 68/100 [00:23<00:09,  3.37it/s] 69%|   | 69/100 [00:23<00:09,  3.22it/s] 70%|   | 70/100 [00:24<00:09,  3.20it/s]                                                {'loss': 0.9991, 'grad_norm': 0.4351355731487274, 'learning_rate': 1.55e-05, 'epoch': 0.08}
- 70%|   | 70/100 [00:24<00:09,  3.20it/s] 71%|   | 71/100 [00:24<00:08,  3.35it/s] 72%|  | 72/100 [00:24<00:07,  3.56it/s] 73%|  | 73/100 [00:24<00:07,  3.80it/s] 74%|  | 74/100 [00:25<00:09,  2.79it/s] 75%|  | 75/100 [00:25<00:08,  3.01it/s] 76%|  | 76/100 [00:25<00:07,  3.33it/s] 77%|  | 77/100 [00:26<00:07,  3.20it/s] 78%|  | 78/100 [00:26<00:06,  3.53it/s] 79%|  | 79/100 [00:26<00:05,  3.52it/s] 80%|  | 80/100 [00:26<00:05,  3.77it/s]                                                {'loss': 0.9537, 'grad_norm': 0.4680567979812622, 'learning_rate': 1.05e-05, 'epoch': 0.09}
- 80%|  | 80/100 [00:26<00:05,  3.77it/s] 81%|  | 81/100 [00:27<00:05,  3.77it/s] 82%| | 82/100 [00:27<00:04,  3.73it/s] 83%| | 83/100 [00:27<00:05,  3.32it/s] 84%| | 84/100 [00:28<00:04,  3.37it/s] 85%| | 85/100 [00:28<00:04,  3.41it/s] 86%| | 86/100 [00:28<00:04,  3.44it/s] 87%| | 87/100 [00:29<00:03,  3.47it/s] 88%| | 88/100 [00:29<00:03,  3.57it/s] 89%| | 89/100 [00:29<00:03,  3.65it/s] 90%| | 90/100 [00:29<00:02,  3.47it/s]                                                {'loss': 0.9677, 'grad_norm': 0.46988463401794434, 'learning_rate': 5.500000000000001e-06, 'epoch': 0.1}
- 90%| | 90/100 [00:29<00:02,  3.47it/s] 91%| | 91/100 [00:30<00:02,  3.41it/s] 92%|| 92/100 [00:30<00:02,  2.70it/s] 93%|| 93/100 [00:31<00:02,  2.78it/s] 94%|| 94/100 [00:31<00:02,  2.83it/s] 95%|| 95/100 [00:31<00:01,  2.99it/s] 96%|| 96/100 [00:31<00:01,  3.13it/s] 97%|| 97/100 [00:32<00:00,  3.27it/s] 98%|| 98/100 [00:32<00:00,  3.13it/s] 99%|| 99/100 [00:32<00:00,  3.00it/s]100%|| 100/100 [00:33<00:00,  2.92it/s]                                                 {'loss': 0.9472, 'grad_norm': 0.45911866426467896, 'learning_rate': 5.000000000000001e-07, 'epoch': 0.11}
 100%|| 100/100 [00:33<00:00,  2.92it/s][INFO|trainer.py:4309] 2025-10-22 20:37:51,912 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
 [INFO|configuration_utils.py:765] 2025-10-22 20:37:52,016 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
 [INFO|configuration_utils.py:839] 2025-10-22 20:37:52,017 >> Model config Qwen2Config {
@@ -11578,7 +11712,7 @@ Checkpoint details:
   Training step: 100
 Updating merge config to point to checkpoint...
 Successfully updated merge config
- 2025
 ========================================
 ========================================
@@ -11772,3 +11906,937 @@ Preparing Training Artifacts
 ========================================
 Copying configuration files...
 Copying and cleaning training logs...

   warnings.warn(
 /scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
   warnings.warn(
+Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml
+Merge Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/merge_config.yaml
+Dataset Info:
+Output Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+Export Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+HF Repo ID: TAUR-dev/testing_llamafactory_helper_quick_test__interactive
+Found pre-tokenized dataset at: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/tokenized/my_custom_sft12
+Training will load from cached tokenized data (fast startup)
+========================================
 STAGE 1: Training Model
 Start Time: Wed Oct 22 08:37:00 PM EDT 2025
 ========================================
 gl064:2627272:2627301 [0] NCCL INFO Using network IB
 gl064:2627272:2627301 [0] NCCL INFO ncclCommInitRankConfig comm 0x15210000 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xb71ac44899f1b45 - Init START
 gl064:2627273:2627302 [1] NCCL INFO ncclCommInitRankConfig comm 0x138c8d70 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb71ac44899f1b45 - Init START
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
+  import pkg_resources
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
+  import pkg_resources
+[INFO|2025-10-22 20:37:15] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.
+[INFO|2025-10-22 20:37:15] llamafactory.hparams.parser:423 >> Process rank: 2, world size: 4, device: cuda:0, distributed training: True, compute dtype: torch.float16
+[INFO|2025-10-22 20:37:15] llamafactory.hparams.parser:423 >> Process rank: 3, world size: 4, device: cuda:1, distributed training: True, compute dtype: torch.float16
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,296 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,296 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,296 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,296 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,296 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,296 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,296 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:37:15,467 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|configuration_utils.py:765] 2025-10-22 20:37:15,655 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:37:15,657 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,715 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,716 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,716 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,716 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,716 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,716 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:37:15,716 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:37:15,882 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[WARNING|2025-10-22 20:37:15] llamafactory.data.loader:148 >> Loading dataset from disk will ignore other data arguments.
+[INFO|2025-10-22 20:37:15] llamafactory.data.loader:143 >> Loaded tokenized dataset from /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/tokenized/my_custom_sft12.
+[INFO|configuration_utils.py:765] 2025-10-22 20:37:15,984 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:37:15,984 >> Model config Qwen2Config {
   "architectures": [
     "Qwen2ForCausalLM"
   ],
   "vocab_size": 151936
 }
+[INFO|2025-10-22 20:37:15] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
+`torch_dtype` is deprecated! Use `dtype` instead!
+[WARNING|logging.py:328] 2025-10-22 20:37:16,316 >> `torch_dtype` is deprecated! Use `dtype` instead!
+[INFO|modeling_utils.py:1172] 2025-10-22 20:37:16,317 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors
+[INFO|modeling_utils.py:2341] 2025-10-22 20:37:16,318 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
+[INFO|configuration_utils.py:986] 2025-10-22 20:37:16,319 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "use_cache": false
+}
+[INFO|configuration_utils.py:941] 2025-10-22 20:37:16,605 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json
+[INFO|configuration_utils.py:986] 2025-10-22 20:37:16,605 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "max_new_tokens": 2048
+}
+[INFO|dynamic_module_utils.py:423] 2025-10-22 20:37:16,637 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B.
+[INFO|2025-10-22 20:37:16] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
+[INFO|2025-10-22 20:37:16] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
+[INFO|2025-10-22 20:37:16] llamafactory.model.adapter:143 >> Upcasting trainable params to float32.
+[INFO|2025-10-22 20:37:16] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA
+[INFO|2025-10-22 20:37:16] llamafactory.model.model_utils.misc:143 >> Found linear modules: gate_proj,v_proj,down_proj,o_proj,q_proj,k_proj,up_proj
+[INFO|2025-10-22 20:37:17] llamafactory.model.loader:143 >> trainable params: 4,399,104 || all params: 498,431,872 || trainable%: 0.8826
+The model is already on multiple devices. Skipping the move to device specified in `args`.
+[WARNING|trainer.py:906] 2025-10-22 20:37:17,117 >> The model is already on multiple devices. Skipping the move to device specified in `args`.
+The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
+[INFO|trainer.py:699] 2025-10-22 20:37:17,119 >> max_steps is given, it will override any value given in num_train_epochs
+[INFO|trainer.py:749] 2025-10-22 20:37:17,119 >> Using auto half precision backend
+[WARNING|trainer.py:982] 2025-10-22 20:37:17,120 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
+gl065:3840251:3840251 [1] NCCL INFO cudaDriverVersion 13000
+gl065:3840251:3840251 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl065:3840251:3840251 [1] NCCL INFO Bootstrap: Using ibs3:10.0.5.1<0>
+gl065:3840251:3840251 [1] NCCL INFO NCCL version 2.27.5+cuda12.9
+gl065:3840251:3840251 [1] NCCL INFO Comm config Blocking set to 1
+gl065:3840250:3840250 [0] NCCL INFO cudaDriverVersion 13000
+gl065:3840250:3840250 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl065:3840250:3840250 [0] NCCL INFO Bootstrap: Using ibs3:10.0.5.1<0>
+gl065:3840250:3840250 [0] NCCL INFO NCCL version 2.27.5+cuda12.9
+gl065:3840250:3840250 [0] NCCL INFO Comm config Blocking set to 1
+gl065:3840251:3840375 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so.
+gl065:3840251:3840375 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
+gl065:3840251:3840375 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl065:3840251:3840375 [1] NCCL INFO NCCL_IB_HCA set to mlx5
+gl065:3840250:3840376 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so.
+gl065:3840250:3840376 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
+gl065:3840250:3840376 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl065:3840250:3840376 [0] NCCL INFO NCCL_IB_HCA set to mlx5
+gl065:3840250:3840376 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.1<0>
+gl065:3840250:3840376 [0] NCCL INFO Initialized NET plugin IB
+gl065:3840251:3840375 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.1<0>
+gl065:3840251:3840375 [1] NCCL INFO Initialized NET plugin IB
+gl065:3840250:3840376 [0] NCCL INFO Assigned NET plugin IB to comm
+gl065:3840251:3840375 [1] NCCL INFO Assigned NET plugin IB to comm
+gl065:3840250:3840376 [0] NCCL INFO Using network IB
+gl065:3840251:3840375 [1] NCCL INFO Using network IB
+gl065:3840250:3840376 [0] NCCL INFO ncclCommInitRankConfig comm 0x15430230 rank 2 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xb71ac44899f1b45 - Init START
+gl065:3840251:3840375 [1] NCCL INFO ncclCommInitRankConfig comm 0x133fe6a0 rank 3 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb71ac44899f1b45 - Init START
+gl065:3840250:3840376 [0] NCCL INFO RAS client listening socket at ::1<28028>
+gl065:3840251:3840375 [1] NCCL INFO RAS client listening socket at ::1<28028>
+gl065:3840251:3840375 [1] NCCL INFO Bootstrap timings total 0.004547 (create 0.000080, send 0.000522, recv 0.001800, ring 0.001144, delay 0.000000)
+gl065:3840250:3840376 [0] NCCL INFO Bootstrap timings total 0.004873 (create 0.000033, send 0.000581, recv 0.000935, ring 0.002906, delay 0.000000)
+gl065:3840250:3840376 [0] NCCL INFO Setting affinity for GPU 0 to 0-15
+gl065:3840251:3840375 [1] NCCL INFO Setting affinity for GPU 1 to 0-15
+gl065:3840251:3840375 [1] NCCL INFO comm 0x133fe6a0 rank 3 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0
+gl065:3840250:3840376 [0] NCCL INFO comm 0x15430230 rank 2 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0
+gl065:3840251:3840375 [1] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2
+gl065:3840250:3840376 [0] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 3/0/-1->2->-1
+gl065:3840251:3840375 [1] NCCL INFO P2P Chunksize set to 131072
+gl065:3840250:3840376 [0] NCCL INFO P2P Chunksize set to 131072
+gl065:3840251:3840375 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
+gl065:3840250:3840376 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
+gl065:3840251:3840381 [1] NCCL INFO [Proxy Service] Device 1 CPU core 8
+gl065:3840251:3840383 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 11
+gl065:3840250:3840382 [0] NCCL INFO [Proxy Service] Device 0 CPU core 9
+gl065:3840250:3840384 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 12
+gl065:3840250:3840376 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl065:3840250:3840376 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl065:3840251:3840375 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl065:3840251:3840375 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl065:3840250:3840376 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
+gl065:3840251:3840375 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
+gl065:3840250:3840376 [0] NCCL INFO ncclCommInitRankConfig comm 0x15430230 rank 2 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xb71ac44899f1b45 - Init COMPLETE
+gl065:3840251:3840375 [1] NCCL INFO ncclCommInitRankConfig comm 0x133fe6a0 rank 3 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xb71ac44899f1b45 - Init COMPLETE
+gl065:3840250:3840376 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 2 nranks 4 total 0.13 (kernels 0.09, alloc 0.01, bootstrap 0.00, allgathers 0.00, topo 0.03, graphs 0.00, connections 0.00, rest 0.00)
+gl065:3840251:3840375 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 3 nranks 4 total 0.13 (kernels 0.09, alloc 0.01, bootstrap 0.00, allgathers 0.00, topo 0.03, graphs 0.00, connections 0.00, rest 0.00)
+gl065:3840250:3840385 [0] NCCL INFO Channel 00/0 : 1[1] -> 2[0] [receive] via NET/IB/0
+gl065:3840250:3840387 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 10
+gl065:3840250:3840385 [0] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [receive] via NET/IB/0
+gl065:3840250:3840385 [0] NCCL INFO Channel 00 : 2[0] -> 3[1] via SHM/direct/direct
+gl065:3840250:3840385 [0] NCCL INFO Channel 01 : 2[0] -> 3[1] via SHM/direct/direct
+gl065:3840251:3840386 [1] NCCL INFO Channel 00/0 : 3[1] -> 0[0] [send] via NET/IB/0
+gl065:3840251:3840386 [1] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [send] via NET/IB/0
+gl065:3840251:3840388 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 1
+gl065:3840251:3840386 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
+gl065:3840250:3840385 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
+[INFO|trainer.py:2519] 2025-10-22 20:37:17,479 >> ***** Running training *****
+[INFO|trainer.py:2520] 2025-10-22 20:37:17,479 >>   Num examples = 3,598
+[INFO|trainer.py:2521] 2025-10-22 20:37:17,479 >>   Num Epochs = 1
+[INFO|trainer.py:2522] 2025-10-22 20:37:17,479 >>   Instantaneous batch size per device = 1
+[INFO|trainer.py:2525] 2025-10-22 20:37:17,479 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
+[INFO|trainer.py:2526] 2025-10-22 20:37:17,479 >>   Gradient Accumulation steps = 1
+[INFO|trainer.py:2527] 2025-10-22 20:37:17,479 >>   Total optimization steps = 100
+[INFO|trainer.py:2528] 2025-10-22 20:37:17,481 >>   Number of trainable parameters = 4,399,104
+[INFO|trainer.py:2810] 2025-10-22 20:37:51,948 >>
+Training completed. Do not forget to share your model on huggingface.co/models =)
+gl065:3840250:3840250 [0] NCCL INFO comm 0x15430230 rank 2 nranks 4 cudaDev 0 busId 47000 - Destroy COMPLETE
+gl065:3840251:3840251 [1] NCCL INFO comm 0x133fe6a0 rank 3 nranks 4 cudaDev 1 busId 59000 - Destroy COMPLETE
+s] 91%| | 91/100 [00:30<00:02,  3.41it/s] 92%|| 92/100 [00:30<00:02,  2.70it/s] 93%|| 93/100 [00:31<00:02,  2.78it/s] 94%|| 94/100 [00:31<00:02,  2.83it/s] 95%|| 95/100 [00:31<00:01,  2.99it/s] 96%|| 96/100 [00:31<00:01,  3.13it/s] 97%|| 97/100 [00:32<00:00,  3.27it/s] 98%|| 98/100 [00:32<00:00,  3.13it/s] 99%|| 99/100 [00:32<00:00,  3.00it/s]100%|| 100/100 [00:33<00:00,  2.92it/s]                                                 {'loss': 0.9472, 'grad_norm': 0.45911866426467896, 'learning_rate': 5.000000000000001e-07, 'epoch': 0.11}
 100%|| 100/100 [00:33<00:00,  2.92it/s][INFO|trainer.py:4309] 2025-10-22 20:37:51,912 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
 [INFO|configuration_utils.py:765] 2025-10-22 20:37:52,016 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
 [INFO|configuration_utils.py:839] 2025-10-22 20:37:52,017 >> Model config Qwen2Config {
   Training step: 100
 Updating merge config to point to checkpoint...
 Successfully updated merge config
+d Time: Wed Oct 22 08:37:55 PM EDT 2025
 ========================================
 ========================================
 ========================================
 Copying configuration files...
 Copying and cleaning training logs...
+Training artifacts prepared in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/training_artifacts
+Contents:
+Log files:
+========================================
+STAGE 3: Uploading to HuggingFace Hub
+Repository: TAUR-dev/testing_llamafactory_helper_quick_test__interactive
+Start Time: Wed Oct 22 08:38:09 PM EDT 2025
+========================================
+Uploading contents of: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+Directory structure:
+Executing: huggingface-cli upload TAUR-dev/testing_llamafactory_helper_quick_test__interactive /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged .
+Start hashing 17 files.
+Finished hashing 17 files.
+[33m  Warning: 'huggingface-cli upload' is deprecated. Use 'hf upload' instead.[0m
+Processing Files (0 / 0)      : |          |  0.00B /  0.00B
+New Data Upload               : |          |  0.00B /  0.00B            [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:   9%|         | 92.2MB /  988MB            [A[A[A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:   9%|         | 92.2MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  10%|         |  104MB / 1.00GB,   ???B/s
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  22%|       |  218MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  23%|       |  229MB / 1.00GB,  629MB/s
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  28%|       |  272MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  28%|       |  284MB / 1.00GB,  449MB/s
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  28%|       |  272MB /  988MB            [A[A[A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  34%|      |  336MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  35%|      |  348MB / 1.00GB,  305MB/s
+New Data Upload               :  48%|     | 64.2MB /  134MB, 80.2MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  39%|      |  382MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  39%|      |  394MB / 1.00GB,  290MB/s
+New Data Upload               :  55%|    |  110MB /  201MB,  110MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  45%|     |  442MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  45%|     |  454MB / 1.00GB,  292MB/s
+New Data Upload               :  63%|   |  170MB /  268MB,  142MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  50%|     |  490MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  50%|     |  501MB / 1.00GB,  284MB/s
+New Data Upload               :  81%|  |  218MB /  268MB,  155MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  55%|    |  542MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  55%|    |  553MB / 1.00GB,  281MB/s
+New Data Upload               :  80%|  |  270MB /  335MB,  169MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  61%|    |  603MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  62%|   |  615MB / 1.00GB,  284MB/s
+New Data Upload               :  82%| |  331MB /  402MB,  184MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  68%|   |  673MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  69%|   |  685MB / 1.00GB,  290MB/s
+New Data Upload               :  85%| |  401MB /  470MB,  200MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  72%|  |  714MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  73%|  |  726MB / 1.00GB,  283MB/s
+New Data Upload               :  82%| |  442MB /  536MB,  201MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  81%|  |  796MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  81%|  |  808MB / 1.00GB,  293MB/s
+New Data Upload               :  87%| |  524MB /  604MB,  218MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  86%| |  849MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  86%| |  860MB / 1.00GB,  291MB/s
+New Data Upload               :  96%||  577MB /  604MB,  222MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  91%||  902MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  91%||  914MB / 1.00GB,  289MB/s
+New Data Upload               :  94%||  630MB /  671MB,  225MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors:  99%||  983MB /  988MB            [A[A[AProcessing Files (1 / 2)      :  99%||  994MB / 1.00GB,  297MB/s
+New Data Upload               :  99%||  710MB /  716MB,  237MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (1 / 2)      : 100%||  999MB / 1.00GB,  280MB/s
+New Data Upload               : 100%||  715MB /  716MB,  224MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (1 / 2)      : 100%||  999MB / 1.00GB,  249MB/s
+New Data Upload               : 100%||  716MB /  716MB,  199MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (1 / 2)      : 100%||  999MB / 1.00GB,  236MB/s
+New Data Upload               : 100%||  716MB /  716MB,  188MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (2 / 2)      : 100%|| 1.00GB / 1.00GB,  224MB/s
+New Data Upload               : 100%||  716MB /  716MB,  179MB/s  [A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[A
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB            [A[A
+  .../merged/model.safetensors: 100%||  988MB /  988MB            [A[A[AProcessing Files (2 / 2)      : 100%|| 1.00GB / 1.00GB,  204MB/s
+New Data Upload               : 100%||  716MB /  716MB,  163MB/s
+  ...ive/merged/tokenizer.json: 100%|| 11.4MB / 11.4MB
+  .../merged/model.safetensors: 100%||  988MB /  988MB
+Removing 13 file(s) from commit that have not changed.
+https://huggingface.co/TAUR-dev/testing_llamafactory_helper_quick_test__interactive/tree/main/.
+========================================
+Upload completed successfully
+Model and training artifacts uploaded to: TAUR-dev/testing_llamafactory_helper_quick_test__interactive
+End Time: Wed Oct 22 08:38:17 PM EDT 2025
+========================================
+========================================
+STAGE 4: Cleanup
+========================================
+Keeping checkpoints in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+Keeping merged model in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+========================================
+PIPELINE COMPLETED SUCCESSFULLY
+End Time: Wed Oct 22 08:38:18 PM EDT 2025
+========================================
+========================================
+Cleaning up LlamaFactory processes
+========================================
+Cleaned up processes on gl064.hpc.nyu.edu
+Cleaning up processes on worker node: gl065
+Process cleanup complete
+========================================
+Job Name: lf_torch_test__interactive
+Hostname: gl064.hpc.nyu.edu
+Number of nodes: 2
+GPUs per node: 2
+Start Time: Wed Oct 22 08:41:31 PM EDT 2025
+Log file: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/logs/pipeline.log
+========================================
+Sourcing secrets from: /scratch/zrs2020/LlamaFactoryHelper/secrets.env
+========================================
+Configuration Paths
+========================================
+Train Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml
+Merge Config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/merge_config.yaml
+Dataset Info:
+Output Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+Export Dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+HF Repo ID: TAUR-dev/testing_llamafactory_helper_quick_test__interactive
+========================================
+Multi-Node Coordination
+========================================
+This is the master node - coordinating worker nodes...
+Master node: gl064
+Master port: 29500
+World size: 2
+Launching on worker node 1: gl065
+All worker nodes launched successfully
+Master node (this node) will now join training as rank 0
+Found pre-tokenized dataset at: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/tokenized/my_custom_sft12
+Training will load from cached tokenized data (fast startup)
+========================================
+STAGE 1: Training Model
+Start Time: Wed Oct 22 08:41:34 PM EDT 2025
+========================================
+Multi-node training detected
+Nodes: 2, GPUs per node: 2
+Master address: gl064
+Master port: 29500
+Node rank: 0
+World size: 2
+CUDA_VISIBLE_DEVICES: 0,1
+LLaMA-Factory path: /scratch/zrs2020/LlamaFactoryHelper/LLaMA-Factory
+Training config: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/train_config.yaml
+Starting distributed training with torch.distributed.run...
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+*****************************************
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
+  warnings.warn(
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
+  warnings.warn(
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
+  import pkg_resources
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
+  import pkg_resources
+[INFO|2025-10-22 20:41:51] llamafactory.hparams.parser:423 >> Process rank: 1, world size: 4, device: cuda:1, distributed training: True, compute dtype: torch.float16
+[INFO|2025-10-22 20:41:51] llamafactory.hparams.parser:143 >> Set `ddp_find_unused_parameters` to False in DDP training since LoRA is enabled.
+[INFO|2025-10-22 20:41:51] llamafactory.hparams.parser:423 >> Process rank: 0, world size: 4, device: cuda:0, distributed training: True, compute dtype: torch.float16
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:51,968 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:51,968 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:51,968 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:51,968 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:51,968 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:51,968 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:51,968 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:41:52,139 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|configuration_utils.py:765] 2025-10-22 20:41:52,336 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:41:52,337 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:52,405 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:52,405 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:52,405 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:52,405 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:52,405 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:52,405 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:41:52,405 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:41:52,571 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[WARNING|2025-10-22 20:41:52] llamafactory.data.loader:148 >> Loading dataset from disk will ignore other data arguments.
+[INFO|2025-10-22 20:41:52] llamafactory.data.loader:143 >> Loaded tokenized dataset from /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/tokenized/my_custom_sft12.
+[INFO|configuration_utils.py:765] 2025-10-22 20:41:52,629 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:41:52,629 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+[INFO|2025-10-22 20:41:52] llamafactory.model.model_utils.kv_cache:143 >> KV cache is disabled during training.
+[WARNING|logging.py:328] 2025-10-22 20:41:52,961 >> `torch_dtype` is deprecated! Use `dtype` instead!
+`torch_dtype` is deprecated! Use `dtype` instead!
+[INFO|modeling_utils.py:1172] 2025-10-22 20:41:52,962 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors
+[INFO|modeling_utils.py:2341] 2025-10-22 20:41:52,963 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
+[INFO|configuration_utils.py:986] 2025-10-22 20:41:52,964 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "use_cache": false
+}
+[INFO|configuration_utils.py:941] 2025-10-22 20:41:53,264 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json
+[INFO|configuration_utils.py:986] 2025-10-22 20:41:53,264 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "max_new_tokens": 2048
+}
+[INFO|dynamic_module_utils.py:423] 2025-10-22 20:41:53,294 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B.
+[INFO|2025-10-22 20:41:53] llamafactory.model.model_utils.checkpointing:143 >> Gradient checkpointing enabled.
+[INFO|2025-10-22 20:41:53] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
+[INFO|2025-10-22 20:41:53] llamafactory.model.adapter:143 >> Upcasting trainable params to float32.
+[INFO|2025-10-22 20:41:53] llamafactory.model.adapter:143 >> Fine-tuning method: LoRA
+[INFO|2025-10-22 20:41:53] llamafactory.model.model_utils.misc:143 >> Found linear modules: v_proj,gate_proj,k_proj,down_proj,o_proj,up_proj,q_proj
+[INFO|2025-10-22 20:41:53] llamafactory.model.loader:143 >> trainable params: 4,399,104 || all params: 498,431,872 || trainable%: 0.8826
+[WARNING|trainer.py:906] 2025-10-22 20:41:53,535 >> The model is already on multiple devices. Skipping the move to device specified in `args`.
+[INFO|trainer.py:699] 2025-10-22 20:41:53,537 >> max_steps is given, it will override any value given in num_train_epochs
+[INFO|trainer.py:749] 2025-10-22 20:41:53,538 >> Using auto half precision backend
+[WARNING|2025-10-22 20:41:53] llamafactory.train.callbacks:154 >> Previous trainer log in this folder will be deleted.
+[WARNING|trainer.py:982] 2025-10-22 20:41:53,541 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
+The model is already on multiple devices. Skipping the move to device specified in `args`.
+The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
+gl064:2628226:2628226 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl064:2628226:2628226 [0] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0>
+gl064:2628226:2628226 [0] NCCL INFO cudaDriverVersion 13000
+gl064:2628226:2628226 [0] NCCL INFO NCCL version 2.27.5+cuda12.9
+gl064:2628226:2628226 [0] NCCL INFO Comm config Blocking set to 1
+gl064:2628227:2628227 [1] NCCL INFO cudaDriverVersion 13000
+gl064:2628227:2628227 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl064:2628227:2628227 [1] NCCL INFO Bootstrap: Using ibs3:10.0.5.0<0>
+gl064:2628227:2628227 [1] NCCL INFO NCCL version 2.27.5+cuda12.9
+gl064:2628227:2628227 [1] NCCL INFO Comm config Blocking set to 1
+gl064:2628226:2628285 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so.
+gl064:2628226:2628285 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
+gl064:2628226:2628285 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl064:2628226:2628285 [0] NCCL INFO NCCL_IB_HCA set to mlx5
+gl064:2628226:2628285 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0>
+gl064:2628226:2628285 [0] NCCL INFO Initialized NET plugin IB
+gl064:2628226:2628285 [0] NCCL INFO Assigned NET plugin IB to comm
+gl064:2628226:2628285 [0] NCCL INFO Using network IB
+gl064:2628226:2628285 [0] NCCL INFO ncclCommInitRankConfig comm 0x14bded40 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xe09b1d45be9f2d6a - Init START
+gl064:2628227:2628286 [1] NCCL INFO NET/Plugin: Could not find: libnccl-net.so.
+gl064:2628227:2628286 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
+gl064:2628227:2628286 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ibs
+gl064:2628227:2628286 [1] NCCL INFO NCCL_IB_HCA set to mlx5
+gl064:2628227:2628286 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ibs3:10.0.5.0<0>
+gl064:2628227:2628286 [1] NCCL INFO Initialized NET plugin IB
+gl064:2628227:2628286 [1] NCCL INFO Assigned NET plugin IB to comm
+gl064:2628227:2628286 [1] NCCL INFO Using network IB
+gl064:2628227:2628286 [1] NCCL INFO ncclCommInitRankConfig comm 0x13dac3f0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xe09b1d45be9f2d6a - Init START
+gl064:2628227:2628286 [1] NCCL INFO RAS client listening socket at ::1<28028>
+gl064:2628226:2628285 [0] NCCL INFO RAS client listening socket at ::1<28028>
+gl064:2628226:2628285 [0] NCCL INFO Bootstrap timings total 1.801443 (create 0.000027, send 0.000089, recv 0.023424, ring 0.000382, delay 0.000000)
+gl064:2628227:2628286 [1] NCCL INFO Bootstrap timings total 1.778394 (create 0.000024, send 0.000070, recv 1.775558, ring 0.002096, delay 0.000000)
+gl064:2628226:2628285 [0] NCCL INFO Setting affinity for GPU 0 to 0-15
+gl064:2628227:2628286 [1] NCCL INFO Setting affinity for GPU 1 to 0-15
+gl064:2628227:2628286 [1] NCCL INFO comm 0x13dac3f0 rank 1 nRanks 4 nNodes 2 localRanks 2 localRank 1 MNNVL 0
+gl064:2628226:2628285 [0] NCCL INFO comm 0x14bded40 rank 0 nRanks 4 nNodes 2 localRanks 2 localRank 0 MNNVL 0
+gl064:2628227:2628286 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
+gl064:2628227:2628286 [1] NCCL INFO P2P Chunksize set to 131072
+gl064:2628226:2628285 [0] NCCL INFO Channel 00/02 : 0 1 2 3
+gl064:2628226:2628285 [0] NCCL INFO Channel 01/02 : 0 1 2 3
+gl064:2628226:2628285 [0] NCCL INFO Trees [0] 1/2/-1->0->-1 [1] 1/-1/-1->0->2
+gl064:2628226:2628285 [0] NCCL INFO P2P Chunksize set to 131072
+gl064:2628227:2628286 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
+gl064:2628226:2628285 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
+gl064:2628226:2628285 [0] NCCL INFO Check P2P Type isAllDirectP2p 0 directMode 0
+gl064:2628226:2628293 [0] NCCL INFO [Proxy Service] Device 0 CPU core 7
+gl064:2628226:2628294 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 12
+gl064:2628227:2628291 [1] NCCL INFO [Proxy Service] Device 1 CPU core 1
+gl064:2628227:2628292 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 2
+gl064:2628226:2628285 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl064:2628226:2628285 [0] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl064:2628227:2628286 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
+gl064:2628227:2628286 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer
+gl064:2628226:2628285 [0] NCCL INFO CC Off, workFifoBytes 1048576
+gl064:2628227:2628286 [1] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
+gl064:2628227:2628286 [1] NCCL INFO ncclCommInitRankConfig comm 0x13dac3f0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 59000 commId 0xe09b1d45be9f2d6a - Init COMPLETE
+gl064:2628227:2628286 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 4 total 1.90 (kernels 0.08, alloc 0.01, bootstrap 1.78, allgathers 0.01, topo 0.02, graphs 0.00, connections 0.00, rest 0.00)
+gl064:2628226:2628285 [0] NCCL INFO TUNER/Plugin: Could not find: libnccl-tuner.so. Using internal tuner plugin.
+gl064:2628226:2628285 [0] NCCL INFO ncclCommInitRankConfig comm 0x14bded40 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 47000 commId 0xe09b1d45be9f2d6a - Init COMPLETE
+gl064:2628226:2628285 [0] NCCL INFO Init timings - ncclCommInitRankConfig: rank 0 nranks 4 total 1.93 (kernels 0.09, alloc 0.01, bootstrap 1.80, allgathers 0.01, topo 0.02, graphs 0.00, connections 0.00, rest 0.00)
+gl064:2628226:2628295 [0] NCCL INFO Channel 00/0 : 3[1] -> 0[0] [receive] via NET/IB/0
+gl064:2628226:2628295 [0] NCCL INFO Channel 01/0 : 3[1] -> 0[0] [receive] via NET/IB/0
+gl064:2628226:2628297 [0] NCCL INFO [Proxy Progress] Device 0 CPU core 3
+gl064:2628226:2628295 [0] NCCL INFO Channel 00 : 0[0] -> 1[1] via SHM/direct/direct
+gl064:2628226:2628295 [0] NCCL INFO Channel 01 : 0[0] -> 1[1] via SHM/direct/direct
+gl064:2628227:2628296 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[0] [send] via NET/IB/0
+gl064:2628227:2628296 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[0] [send] via NET/IB/0
+gl064:2628227:2628298 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 4
+gl064:2628227:2628296 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
+gl064:2628226:2628295 [0] NCCL INFO Connected all rings, use ring PXN 0 GDR 0
+[INFO|trainer.py:2519] 2025-10-22 20:41:55,692 >> ***** Running training *****
+[INFO|trainer.py:2520] 2025-10-22 20:41:55,693 >>   Num examples = 3,598
+[INFO|trainer.py:2521] 2025-10-22 20:41:55,693 >>   Num Epochs = 1
+[INFO|trainer.py:2522] 2025-10-22 20:41:55,693 >>   Instantaneous batch size per device = 1
+[INFO|trainer.py:2525] 2025-10-22 20:41:55,693 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
+[INFO|trainer.py:2526] 2025-10-22 20:41:55,693 >>   Gradient Accumulation steps = 1
+[INFO|trainer.py:2527] 2025-10-22 20:41:55,693 >>   Total optimization steps = 100
+[INFO|trainer.py:2528] 2025-10-22 20:41:55,694 >>   Number of trainable parameters = 4,399,104
+[INFO|integration_utils.py:867] 2025-10-22 20:41:55,714 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
+wandb: Currently logged in as: zsprague (ut_nlp_deduce) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: Tracking run with wandb version 0.22.2
+wandb: Run data is saved locally in /scratch/zrs2020/LlamaFactoryHelper/wandb/run-20251022_204155-v2077oxb
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run interactive_test
+wandb:  View project at https://wandb.ai/ut_nlp_deduce/llamafactory
+wandb:  View run at https://wandb.ai/ut_nlp_deduce/llamafactory/runs/v2077oxb
+  0%|          | 0/100 [00:00<?, ?it/s]  1%|          | 1/100 [00:00<00:56,  1.77it/s]  2%|         | 2/100 [00:00<00:39,  2.47it/s]  3%|         | 3/100 [00:01<00:39,  2.47it/s]  4%|         | 4/100 [00:01<00:33,  2.88it/s]  5%|         | 5/100 [00:02<00:40,  2.33it/s]  6%|         | 6/100 [00:02<00:34,  2.73it/s]  7%|         | 7/100 [00:02<00:31,  2.98it/s]  8%|         | 8/100 [00:02<00:29,  3.14it/s]  9%|         | 9/100 [00:03<00:27,  3.31it/s] 10%|         | 10/100 [00:03<00:29,  3.06it/s]                                                {'loss': 1.286, 'grad_norm': 0.3636094033718109, 'learning_rate': 4.55e-05, 'epoch': 0.01}
+ 10%|         | 10/100 [00:03<00:29,  3.06it/s] 11%|         | 11/100 [00:04<00:35,  2.52it/s] 12%|        | 12/100 [00:04<00:33,  2.61it/s] 13%|        | 13/100 [00:04<00:30,  2.88it/s]NFO|trainer.py:2521] 2025-10-22 20:41:55,693 >>   Num Epochs = 1
+[INFO|trainer.py:2522] 2025-10-22 20:41:55,693 >>   Instantaneous batch size per device = 1
+[INFO|trainer.py:2525] 2025-10-22 20:41:55,693 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
+[INFO|trainer.py:2526] 2025-10-22 20:41:55,693 >>   Gradient Accumulation steps = 1
+[INFO|trainer.py:2527] 2025-10-22 20:41:55,693 >>   Total optimization steps = 100
+[INFO|trainer.py:2528] 2025-10-22 20:41:55,694 >>   Number of trainable parameters = 4,399,104
+ 14%|        | 14/100 [00:05<00:29,  2.95it/s] 15%|        | 15/100 [00:05<00:29,  2.87it/s] 16%|        | 16/100 [00:05<00:26,  3.15it/s] 17%|        | 17/100 [00:05<00:26,  3.08it/s] 18%|        | 18/100 [00:06<00:27,  2.98it/s] 19%|        | 19/100 [00:06<00:27,  2.90it/s] 20%|        | 20/100 [00:07<00:26,  3.02it/s]                                                {'loss': 1.1751, 'grad_norm': 0.3897131383419037, 'learning_rate': 4.05e-05, 'epoch': 0.02}
+ 20%|        | 20/100 [00:07<00:26,  3.02it/s] 21%|        | 21/100 [00:07<00:31,  2.50it/s] 22%|       | 22/100 [00:07<00:29,  2.61it/s] 23%|       | 23/100 [00:08<00:29,  2.64it/s] 24%|       | 24/100 [00:08<00:26,  2.88it/s] 25%|       | 25/100 [00:09<00:31,  2.35it/s] 26%|       | 26/100 [00:09<00:28,  2.64it/s] 27%|       | 27/100 [00:09<00:25,  2.88it/s] 28%|       | 28/100 [00:10<00:24,  2.98it/s] 29%|       | 29/100 [00:10<00:22,  3.11it/s] 30%|       | 30/100 [00:10<00:21,  3.19it/s]                                                {'loss': 1.1373, 'grad_norm': 0.42557743191719055, 'learning_rate': 3.55e-05, 'epoch': 0.03}
+ 30%|       | 30/100 [00:10<00:21,  3.19it/s] 31%|       | 31/100 [00:10<00:22,  3.01it/s] 32%|      | 32/100 [00:11<00:21,  3.11it/s] 33%|      | 33/100 [00:11<00:20,  3.25it/s] 34%|      | 34/100 [00:12<00:25,  2.62it/s] 35%|      | 35/100 [00:12<00:23,  2.71it/s] 36%|      | 36/100 [00:12<00:22,  2.90it/s] 37%|      | 37/100 [00:12<00:19,  3.30it/s] 38%|      | 38/100 [00:13<00:18,  3.40it/s] 39%|      | 39/100 [00:13<00:16,  3.70it/s] 40%|      | 40/100 [00:13<00:18,  3.17it/s]                                                {'loss': 1.0636, 'grad_norm': 0.42947664856910706, 'learning_rate': 3.05e-05, 'epoch': 0.04}
+ 40%|      | 40/100 [00:13<00:18,  3.17it/s] 41%|      | 41/100 [00:14<00:17,  3.28it/s] 42%|     | 42/100 [00:14<00:17,  3.40it/s] 43%|     | 43/100 [00:14<00:17,  3.20it/s] 44%|     | 44/100 [00:15<00:16,  3.35it/s] 45%|     | 45/100 [00:15<00:16,  3.36it/s] 46%|     | 46/100 [00:15<00:16,  3.18it/s] 47%|     | 47/100 [00:15<00:15,  3.43it/s] 48%|     | 48/100 [00:16<00:15,  3.40it/s] 49%|     | 49/100 [00:16<00:15,  3.31it/s] 50%|     | 50/100 [00:16<00:15,  3.28it/s]                                                {'loss': 1.0329, 'grad_norm': 0.43117761611938477, 'learning_rate': 2.5500000000000003e-05, 'epoch': 0.06}
+ 50%|     | 50/100 [00:16<00:15,  3.28it/s][INFO|trainer.py:4309] 2025-10-22 20:42:13,492 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50
+[INFO|configuration_utils.py:765] 2025-10-22 20:42:13,619 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:42:13,619 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:42:13,785 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/chat_template.jinja
+[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:42:13,791 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:42:13,810 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-50/special_tokens_map.json
+ 51%|     | 51/100 [00:17<00:26,  1.84it/s] 52%|    | 52/100 [00:18<00:22,  2.09it/s] 53%|    | 53/100 [00:18<00:21,  2.16it/s] 54%|    | 54/100 [00:19<00:20,  2.28it/s] 55%|    | 55/100 [00:19<00:18,  2.43it/s] 56%|    | 56/100 [00:19<00:16,  2.65it/s] 57%|    | 57/100 [00:20<00:15,  2.82it/s] 58%|    | 58/100 [00:20<00:14,  2.86it/s] 59%|    | 59/100 [00:20<00:13,  2.97it/s] 60%|    | 60/100 [00:20<00:13,  3.07it/s]                                                {'loss': 0.9981, 'grad_norm': 0.45059695839881897, 'learning_rate': 2.05e-05, 'epoch': 0.07}
+ 60%|    | 60/100 [00:20<00:13,  3.07it/s] 61%|    | 61/100 [00:21<00:12,  3.13it/s] 62%|   | 62/100 [00:21<00:11,  3.20it/s] 63%|   | 63/100 [00:21<00:11,  3.33it/s] 64%|   | 64/100 [00:22<00:10,  3.48it/s] 65%|   | 65/100 [00:22<00:10,  3.42it/s] 66%|   | 66/100 [00:22<00:10,  3.20it/s] 67%|   | 67/100 [00:23<00:09,  3.37it/s] 68%|   | 68/100 [00:23<00:09,  3.40it/s] 69%|   | 69/100 [00:23<00:09,  3.25it/s] 70%|   | 70/100 [00:23<00:09,  3.23it/s]                                                {'loss': 0.9991, 'grad_norm': 0.43518301844596863, 'learning_rate': 1.55e-05, 'epoch': 0.08}
+ 70%|   | 70/100 [00:23<00:09,  3.23it/s] 71%|   | 71/100 [00:24<00:08,  3.37it/s] 72%|  | 72/100 [00:24<00:07,  3.60it/s] 73%|  | 73/100 [00:24<00:07,  3.83it/s] 74%|  | 74/100 [00:25<00:09,  2.81it/s] 75%|  | 75/100 [00:25<00:08,  3.04it/s] 76%|  | 76/100 [00:25<00:07,  3.36it/s] 77%|  | 77/100 [00:26<00:07,  3.23it/s] 78%|  | 78/100 [00:26<00:06,  3.56it/s] 79%|  | 79/100 [00:26<00:05,  3.56it/s] 80%|  | 80/100 [00:26<00:05,  3.81it/s]                                                {'loss': 0.9537, 'grad_norm': 0.46800264716148376, 'learning_rate': 1.05e-05, 'epoch': 0.09}
+ 80%|  | 80/100 [00:26<00:05,  3.81it/s] 81%|  | 81/100 [00:27<00:04,  3.81it/s] 82%| | 82/100 [00:27<00:04,  3.77it/s] 83%| | 83/100 [00:27<00:05,  3.35it/s] 84%| | 84/100 [00:28<00:04,  3.41it/s] 85%| | 85/100 [00:28<00:04,  3.45it/s] 86%| | 86/100 [00:28<00:04,  3.48it/s] 87%| | 87/100 [00:28<00:03,  3.51it/s] 88%| | 88/100 [00:29<00:03,  3.61it/s] 89%| | 89/100 [00:29<00:02,  3.69it/s] 90%| | 90/100 [00:29<00:02,  3.51it/s]                                                {'loss': 0.9677, 'grad_norm': 0.4698624014854431, 'learning_rate': 5.500000000000001e-06, 'epoch': 0.1}
+ 90%| | 90/100 [00:29<00:02,  3.51it/s] 91%| | 91/100 [00:29<00:02,  3.46it/s] 92%|| 92/100 [00:30<00:02,  2.73it/s] 93%|| 93/100 [00:30<00:02,  2.80it/s] 94%|| 94/100 [00:31<00:02,  2.85it/s] 95%|| 95/100 [00:31<00:01,  3.02it/s] 96%|| 96/100 [00:31<00:01,  3.16it/s] 97%|| 97/100 [00:32<00:00,  3.29it/s] 98%|| 98/100 [00:32<00:00,  3.16it/s] 99%|| 99/100 [00:32<00:00,  3.02it/s]100%|| 100/100 [00:33<00:00,  2.94it/s]                                                 {'loss': 0.9472, 'grad_norm': 0.45893919467926025, 'learning_rate': 5.000000000000001e-07, 'epoch': 0.11}
+100%|| 100/100 [00:33<00:00,  2.94it/s][INFO|trainer.py:4309] 2025-10-22 20:42:29,757 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+[INFO|configuration_utils.py:765] 2025-10-22 20:42:29,909 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:42:29,910 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:42:30,068 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/chat_template.jinja
+[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:42:30,073 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:42:30,092 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100/special_tokens_map.json
+[INFO|trainer.py:2810] 2025-10-22 20:42:30,561 >>
+Training completed. Do not forget to share your model on huggingface.co/models =)
+                                                 {'train_runtime': 34.8678, 'train_samples_per_second': 11.472, 'train_steps_per_second': 2.868, 'train_loss': 1.0560622215270996, 'epoch': 0.11}
+100%|| 100/100 [00:33<00:00,  2.94it/s]100%|| 100/100 [00:33<00:00,  2.95it/s]
+[INFO|trainer.py:4309] 2025-10-22 20:42:30,573 >> Saving model checkpoint to /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+[INFO|configuration_utils.py:765] 2025-10-22 20:42:30,670 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:42:30,670 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:42:30,782 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/chat_template.jinja
+[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:42:30,787 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:42:30,792 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/special_tokens_map.json
+***** train metrics *****
+  epoch                    =     0.1111
+  total_flos               =  2407106GF
+  train_loss               =     1.0561
+  train_runtime            = 0:00:34.86
+  train_samples_per_second =     11.472
+  train_steps_per_second   =      2.868
+[INFO|modelcard.py:456] 2025-10-22 20:42:30,948 >> Dropping the following result as it does not have all the necessary fields:
+{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
+gl064:2628227:2628227 [1] NCCL INFO comm 0x13dac3f0 rank 1 nranks 4 cudaDev 1 busId 59000 - Destroy COMPLETE
+gl064:2628226:2628226 [0] NCCL INFO comm 0x14bded40 rank 0 nranks 4 cudaDev 0 busId 47000 - Destroy COMPLETE
+[1;34mwandb[0m:
+[1;34mwandb[0m:  View run [33minteractive_test[0m at: [34m[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20251022_204155-v2077oxb/logs[0m
+========================================
+Training completed successfully
+End Time: Wed Oct 22 08:42:32 PM EDT 2025
+========================================
+========================================
+STAGE 2: Merging/Exporting Model
+Start Time: Wed Oct 22 08:42:32 PM EDT 2025
+========================================
+Looking for checkpoints in: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints
+Analyzing checkpoints to find the one from current training run...
+  - checkpoint-100: trainer_state.json modified at Wed Oct 22 08:42:30 PM EDT 2025
+  - checkpoint-50: trainer_state.json modified at Wed Oct 22 08:42:14 PM EDT 2025
+Selected checkpoint: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+This checkpoint has the most recently updated trainer_state.json
+Checkpoint details:
+  Path: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+  Last modified: 2025-10-22 16:54:17.414188691 -0400
+  Training step: 100
+Updating merge config to point to checkpoint...
+Successfully updated merge config
+Updated merge config to use: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+Merge config contents:
+  model_name_or_path: Qwen/Qwen2.5-0.5B
+  finetuning_type: lora
+  trust_remote_code: true
+  adapter_name_or_path: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+  template: default
+  export_dir: /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged
+Executing command: llamafactory-cli export /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/configs/merge_config.yaml
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
+  warnings.warn(
+/scratch/zrs2020/miniconda/miniconda3/envs/llamafactory/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
+  import pkg_resources
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:40,849 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:40,849 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:40,850 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:40,850 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:40,850 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:40,850 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:40,850 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:42:41,020 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|configuration_utils.py:765] 2025-10-22 20:42:41,199 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:42:41,201 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:41,261 >> loading file vocab.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/vocab.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:41,261 >> loading file merges.txt from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/merges.txt
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:41,261 >> loading file tokenizer.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:41,261 >> loading file added_tokens.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:41,261 >> loading file special_tokens_map.json from cache at None
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:41,261 >> loading file tokenizer_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2095] 2025-10-22 20:42:41,261 >> loading file chat_template.jinja from cache at None
+[INFO|tokenization_utils_base.py:2364] 2025-10-22 20:42:41,425 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+[INFO|configuration_utils.py:765] 2025-10-22 20:42:41,469 >> loading configuration file config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/config.json
+[INFO|configuration_utils.py:839] 2025-10-22 20:42:41,470 >> Model config Qwen2Config {
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "dtype": "bfloat16",
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 32768,
+  "max_window_layers": 24,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "transformers_version": "4.57.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}
+[WARNING|logging.py:328] 2025-10-22 20:42:41,470 >> `torch_dtype` is deprecated! Use `dtype` instead!
+[INFO|2025-10-22 20:42:41] llamafactory.model.model_utils.kv_cache:143 >> KV cache is enabled for faster generation.
+[WARNING|logging.py:328] 2025-10-22 20:42:41,792 >> `torch_dtype` is deprecated! Use `dtype` instead!
+[INFO|modeling_utils.py:1172] 2025-10-22 20:42:41,792 >> loading weights file model.safetensors from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/model.safetensors
+[INFO|modeling_utils.py:2341] 2025-10-22 20:42:41,793 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
+[INFO|configuration_utils.py:986] 2025-10-22 20:42:41,794 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643
+}
+[INFO|configuration_utils.py:941] 2025-10-22 20:42:41,880 >> loading configuration file generation_config.json from cache at /scratch/zrs2020/.cache/hf_cache/home/hub/models--Qwen--Qwen2.5-0.5B/snapshots/060db6499f32faf8b98477b0a26969ef7d8b9987/generation_config.json
+[INFO|configuration_utils.py:986] 2025-10-22 20:42:41,880 >> Generate config GenerationConfig {
+  "bos_token_id": 151643,
+  "eos_token_id": 151643,
+  "max_new_tokens": 2048
+}
+[INFO|dynamic_module_utils.py:423] 2025-10-22 20:42:41,910 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen2.5-0.5B.
+[INFO|2025-10-22 20:42:41] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
+[INFO|2025-10-22 20:42:42] llamafactory.model.adapter:143 >> Merged 1 adapter(s).
+[INFO|2025-10-22 20:42:42] llamafactory.model.adapter:143 >> Loaded adapter(s): /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/checkpoints/checkpoint-100
+[INFO|2025-10-22 20:42:42] llamafactory.model.loader:143 >> all params: 494,032,768
+[INFO|2025-10-22 20:42:42] llamafactory.train.tuner:143 >> Convert model dtype to: torch.bfloat16.
+[INFO|configuration_utils.py:491] 2025-10-22 20:42:42,694 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/config.json
+[INFO|configuration_utils.py:757] 2025-10-22 20:42:42,697 >> Configuration saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/generation_config.json
+[INFO|modeling_utils.py:4181] 2025-10-22 20:42:44,339 >> Model weights saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/model.safetensors
+[INFO|tokenization_utils_base.py:2421] 2025-10-22 20:42:44,344 >> chat template saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/chat_template.jinja
+[INFO|tokenization_utils_base.py:2590] 2025-10-22 20:42:44,349 >> tokenizer config file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/tokenizer_config.json
+[INFO|tokenization_utils_base.py:2599] 2025-10-22 20:42:44,354 >> Special tokens file saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/special_tokens_map.json
+[INFO|2025-10-22 20:42:44] llamafactory.train.tuner:143 >> Ollama modelfile saved in /scratch/zrs2020/LlamaFactoryHelper/experiments/lf_torch_test__interactive/merged/Modelfile
+========================================
+Merge/Export completed successfully
+End Time: Wed Oct 22 08:42:45 PM EDT 2025
+========================================
+========================================
+Preparing Training Artifacts
+========================================
+Copying configuration files...
+Copying and cleaning training logs...