Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

.gitattributes +1 -0
README.md +202 -0
adapter_config.json +31 -0
adapter_model.safetensors +3 -0
added_tokens.json +3 -0
chat_template.jinja +47 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +33 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0
trainer_state.json +874 -0
training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/gemma-3-1b-it-unsloth-bnb-4bit
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2

adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": "(?:.*?(?:language|text).*?(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense).*?(?:q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj).*?)|(?:\\bmodel\\.layers\\.[\\d]{1,}\\.(?:self_attn|attention|attn|mlp|feed_forward|ffn|dense)\\.(?:(?:q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)))",
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed152d3dd7781f28bfdccd2d94935b025e5c21e2fde24911f6308eefc1af24b3
+size 26139264

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "<image_soft_token>": 262144
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,47 @@

+{{ bos_token }}
+{%- if messages[0]['role'] == 'system' -%}
+    {%- if messages[0]['content'] is string -%}
+        {%- set first_user_prefix = messages[0]['content'] + '
+' -%}
+    {%- else -%}
+        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
+' -%}
+    {%- endif -%}
+    {%- set loop_messages = messages[1:] -%}
+{%- else -%}
+    {%- set first_user_prefix = "" -%}
+    {%- set loop_messages = messages -%}
+{%- endif -%}
+{%- for message in loop_messages -%}
+    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
+        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
+    {%- endif -%}
+    {%- if (message['role'] == 'assistant') -%}
+        {%- set role = "model" -%}
+    {%- else -%}
+        {%- set role = message['role'] -%}
+    {%- endif -%}
+    {{ '<start_of_turn>' + role + '
+' + (first_user_prefix if loop.first else "") }}
+    {%- if message['content'] is string -%}
+        {{ message['content'] | trim }}
+    {%- elif message['content'] is iterable -%}
+        {%- for item in message['content'] -%}
+            {%- if item['type'] == 'image' -%}
+                {{ '<start_of_image>' }}
+            {%- elif item['type'] == 'text' -%}
+                {{ item['text'] | trim }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{ raise_exception("Invalid content type") }}
+    {%- endif -%}
+    {{ '<end_of_turn>
+' }}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{'<start_of_turn>model
+'}}
+{%- endif -%}

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:96d22f3c80d9e38825d7dbca198bc2c65beed83d6b76407f4ce29ebf6a533811
+size 14270428

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab959a5a8d89442943331d78fa542b23139cc536e903d1ac539619bc780cb97f
+size 14244

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d5c251c3681725d951564543cc04ab162aa7831157e59b36b6eb1acd8288645e
+size 1064

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<end_of_turn>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

trainer_state.json ADDED Viewed

	@@ -0,0 +1,874 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.7365578197888534,
+  "eval_steps": 500,
+  "global_step": 3000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.006137981831573778,
+      "grad_norm": 37.7730598449707,
+      "learning_rate": 5.889570552147239e-06,
+      "loss": 3.4685,
+      "step": 25
+    },
+    {
+      "epoch": 0.012275963663147557,
+      "grad_norm": 11.39767074584961,
+      "learning_rate": 1.2024539877300614e-05,
+      "loss": 1.7111,
+      "step": 50
+    },
+    {
+      "epoch": 0.018413945494721334,
+      "grad_norm": 4.119814872741699,
+      "learning_rate": 1.815950920245399e-05,
+      "loss": 0.6616,
+      "step": 75
+    },
+    {
+      "epoch": 0.024551927326295114,
+      "grad_norm": 4.619405269622803,
+      "learning_rate": 2.4294478527607366e-05,
+      "loss": 0.433,
+      "step": 100
+    },
+    {
+      "epoch": 0.030689909157868893,
+      "grad_norm": 4.100978374481201,
+      "learning_rate": 3.042944785276074e-05,
+      "loss": 0.3938,
+      "step": 125
+    },
+    {
+      "epoch": 0.03682789098944267,
+      "grad_norm": 4.166531562805176,
+      "learning_rate": 3.656441717791411e-05,
+      "loss": 0.3766,
+      "step": 150
+    },
+    {
+      "epoch": 0.04296587282101645,
+      "grad_norm": 4.5262837409973145,
+      "learning_rate": 4.2699386503067485e-05,
+      "loss": 0.3133,
+      "step": 175
+    },
+    {
+      "epoch": 0.04910385465259023,
+      "grad_norm": 2.890160322189331,
+      "learning_rate": 4.8834355828220865e-05,
+      "loss": 0.3159,
+      "step": 200
+    },
+    {
+      "epoch": 0.05524183648416401,
+      "grad_norm": 5.569750785827637,
+      "learning_rate": 5.496932515337424e-05,
+      "loss": 0.2948,
+      "step": 225
+    },
+    {
+      "epoch": 0.061379818315737786,
+      "grad_norm": 6.213336944580078,
+      "learning_rate": 6.110429447852761e-05,
+      "loss": 0.3576,
+      "step": 250
+    },
+    {
+      "epoch": 0.06751780014731157,
+      "grad_norm": 3.2000479698181152,
+      "learning_rate": 6.723926380368099e-05,
+      "loss": 0.3153,
+      "step": 275
+    },
+    {
+      "epoch": 0.07365578197888534,
+      "grad_norm": 2.484917640686035,
+      "learning_rate": 7.337423312883436e-05,
+      "loss": 0.3057,
+      "step": 300
+    },
+    {
+      "epoch": 0.07979376381045912,
+      "grad_norm": 2.4827752113342285,
+      "learning_rate": 7.950920245398772e-05,
+      "loss": 0.2956,
+      "step": 325
+    },
+    {
+      "epoch": 0.0859317456420329,
+      "grad_norm": 2.170391082763672,
+      "learning_rate": 8.564417177914112e-05,
+      "loss": 0.3053,
+      "step": 350
+    },
+    {
+      "epoch": 0.09206972747360667,
+      "grad_norm": 2.9628348350524902,
+      "learning_rate": 9.177914110429448e-05,
+      "loss": 0.2941,
+      "step": 375
+    },
+    {
+      "epoch": 0.09820770930518045,
+      "grad_norm": 4.929935932159424,
+      "learning_rate": 9.791411042944786e-05,
+      "loss": 0.3107,
+      "step": 400
+    },
+    {
+      "epoch": 0.10434569113675424,
+      "grad_norm": 2.3750085830688477,
+      "learning_rate": 0.00010404907975460123,
+      "loss": 0.3004,
+      "step": 425
+    },
+    {
+      "epoch": 0.11048367296832802,
+      "grad_norm": 2.280918836593628,
+      "learning_rate": 0.00011018404907975461,
+      "loss": 0.2912,
+      "step": 450
+    },
+    {
+      "epoch": 0.11662165479990179,
+      "grad_norm": 2.4553568363189697,
+      "learning_rate": 0.00011631901840490798,
+      "loss": 0.2549,
+      "step": 475
+    },
+    {
+      "epoch": 0.12275963663147557,
+      "grad_norm": 2.673969030380249,
+      "learning_rate": 0.00012245398773006136,
+      "loss": 0.2997,
+      "step": 500
+    },
+    {
+      "epoch": 0.12889761846304934,
+      "grad_norm": 2.394958019256592,
+      "learning_rate": 0.0001285889570552147,
+      "loss": 0.2725,
+      "step": 525
+    },
+    {
+      "epoch": 0.13503560029462314,
+      "grad_norm": 2.0582079887390137,
+      "learning_rate": 0.0001347239263803681,
+      "loss": 0.288,
+      "step": 550
+    },
+    {
+      "epoch": 0.1411735821261969,
+      "grad_norm": 2.5277466773986816,
+      "learning_rate": 0.0001408588957055215,
+      "loss": 0.2633,
+      "step": 575
+    },
+    {
+      "epoch": 0.14731156395777067,
+      "grad_norm": 2.573945999145508,
+      "learning_rate": 0.00014699386503067485,
+      "loss": 0.2741,
+      "step": 600
+    },
+    {
+      "epoch": 0.15344954578934447,
+      "grad_norm": 1.9574755430221558,
+      "learning_rate": 0.00015312883435582823,
+      "loss": 0.2955,
+      "step": 625
+    },
+    {
+      "epoch": 0.15958752762091824,
+      "grad_norm": 1.681303858757019,
+      "learning_rate": 0.00015926380368098158,
+      "loss": 0.2811,
+      "step": 650
+    },
+    {
+      "epoch": 0.165725509452492,
+      "grad_norm": 1.8627731800079346,
+      "learning_rate": 0.00016539877300613496,
+      "loss": 0.2678,
+      "step": 675
+    },
+    {
+      "epoch": 0.1718634912840658,
+      "grad_norm": 1.7298288345336914,
+      "learning_rate": 0.00017153374233128837,
+      "loss": 0.2661,
+      "step": 700
+    },
+    {
+      "epoch": 0.17800147311563957,
+      "grad_norm": 1.6428427696228027,
+      "learning_rate": 0.00017766871165644172,
+      "loss": 0.2672,
+      "step": 725
+    },
+    {
+      "epoch": 0.18413945494721334,
+      "grad_norm": 2.5652105808258057,
+      "learning_rate": 0.0001838036809815951,
+      "loss": 0.2487,
+      "step": 750
+    },
+    {
+      "epoch": 0.19027743677878714,
+      "grad_norm": 1.0266259908676147,
+      "learning_rate": 0.00018993865030674846,
+      "loss": 0.2733,
+      "step": 775
+    },
+    {
+      "epoch": 0.1964154186103609,
+      "grad_norm": 1.1475216150283813,
+      "learning_rate": 0.00019607361963190186,
+      "loss": 0.27,
+      "step": 800
+    },
+    {
+      "epoch": 0.2025534004419347,
+      "grad_norm": 2.0874133110046387,
+      "learning_rate": 0.00019975446733051425,
+      "loss": 0.2557,
+      "step": 825
+    },
+    {
+      "epoch": 0.20869138227350847,
+      "grad_norm": 2.0207202434539795,
+      "learning_rate": 0.00019907243213749832,
+      "loss": 0.2523,
+      "step": 850
+    },
+    {
+      "epoch": 0.21482936410508224,
+      "grad_norm": 2.3469483852386475,
+      "learning_rate": 0.00019839039694448233,
+      "loss": 0.2889,
+      "step": 875
+    },
+    {
+      "epoch": 0.22096734593665604,
+      "grad_norm": 2.777117967605591,
+      "learning_rate": 0.0001977083617514664,
+      "loss": 0.2521,
+      "step": 900
+    },
+    {
+      "epoch": 0.2271053277682298,
+      "grad_norm": 1.5443978309631348,
+      "learning_rate": 0.00019702632655845044,
+      "loss": 0.2491,
+      "step": 925
+    },
+    {
+      "epoch": 0.23324330959980358,
+      "grad_norm": 2.6930274963378906,
+      "learning_rate": 0.00019634429136543445,
+      "loss": 0.2696,
+      "step": 950
+    },
+    {
+      "epoch": 0.23938129143137737,
+      "grad_norm": 2.0129122734069824,
+      "learning_rate": 0.00019566225617241852,
+      "loss": 0.2643,
+      "step": 975
+    },
+    {
+      "epoch": 0.24551927326295114,
+      "grad_norm": 2.7098405361175537,
+      "learning_rate": 0.00019498022097940254,
+      "loss": 0.2603,
+      "step": 1000
+    },
+    {
+      "epoch": 0.2516572550945249,
+      "grad_norm": 1.4850163459777832,
+      "learning_rate": 0.00019429818578638658,
+      "loss": 0.2584,
+      "step": 1025
+    },
+    {
+      "epoch": 0.2577952369260987,
+      "grad_norm": 1.2259314060211182,
+      "learning_rate": 0.00019361615059337064,
+      "loss": 0.2809,
+      "step": 1050
+    },
+    {
+      "epoch": 0.26393321875767245,
+      "grad_norm": 2.409186601638794,
+      "learning_rate": 0.00019293411540035466,
+      "loss": 0.2458,
+      "step": 1075
+    },
+    {
+      "epoch": 0.2700712005892463,
+      "grad_norm": 1.6780492067337036,
+      "learning_rate": 0.00019225208020733872,
+      "loss": 0.2585,
+      "step": 1100
+    },
+    {
+      "epoch": 0.27620918242082004,
+      "grad_norm": 1.5900087356567383,
+      "learning_rate": 0.00019157004501432274,
+      "loss": 0.2234,
+      "step": 1125
+    },
+    {
+      "epoch": 0.2823471642523938,
+      "grad_norm": 6.657894611358643,
+      "learning_rate": 0.00019088800982130678,
+      "loss": 0.2456,
+      "step": 1150
+    },
+    {
+      "epoch": 0.2884851460839676,
+      "grad_norm": 2.0989744663238525,
+      "learning_rate": 0.00019020597462829085,
+      "loss": 0.2206,
+      "step": 1175
+    },
+    {
+      "epoch": 0.29462312791554135,
+      "grad_norm": 2.613159656524658,
+      "learning_rate": 0.00018952393943527486,
+      "loss": 0.2701,
+      "step": 1200
+    },
+    {
+      "epoch": 0.3007611097471152,
+      "grad_norm": 1.9629878997802734,
+      "learning_rate": 0.00018884190424225893,
+      "loss": 0.2271,
+      "step": 1225
+    },
+    {
+      "epoch": 0.30689909157868894,
+      "grad_norm": 2.393691062927246,
+      "learning_rate": 0.00018815986904924294,
+      "loss": 0.2234,
+      "step": 1250
+    },
+    {
+      "epoch": 0.3130370734102627,
+      "grad_norm": 1.4522358179092407,
+      "learning_rate": 0.00018747783385622698,
+      "loss": 0.2608,
+      "step": 1275
+    },
+    {
+      "epoch": 0.3191750552418365,
+      "grad_norm": 1.3521190881729126,
+      "learning_rate": 0.00018679579866321105,
+      "loss": 0.2411,
+      "step": 1300
+    },
+    {
+      "epoch": 0.32531303707341025,
+      "grad_norm": 1.919198989868164,
+      "learning_rate": 0.00018611376347019506,
+      "loss": 0.2389,
+      "step": 1325
+    },
+    {
+      "epoch": 0.331451018904984,
+      "grad_norm": 2.17889666557312,
+      "learning_rate": 0.00018543172827717913,
+      "loss": 0.2374,
+      "step": 1350
+    },
+    {
+      "epoch": 0.33758900073655784,
+      "grad_norm": 1.776234745979309,
+      "learning_rate": 0.00018474969308416314,
+      "loss": 0.2491,
+      "step": 1375
+    },
+    {
+      "epoch": 0.3437269825681316,
+      "grad_norm": 1.802179217338562,
+      "learning_rate": 0.00018406765789114718,
+      "loss": 0.2578,
+      "step": 1400
+    },
+    {
+      "epoch": 0.3498649643997054,
+      "grad_norm": 1.9841917753219604,
+      "learning_rate": 0.00018338562269813125,
+      "loss": 0.2595,
+      "step": 1425
+    },
+    {
+      "epoch": 0.35600294623127915,
+      "grad_norm": 1.5063601732254028,
+      "learning_rate": 0.00018270358750511527,
+      "loss": 0.227,
+      "step": 1450
+    },
+    {
+      "epoch": 0.3621409280628529,
+      "grad_norm": 1.7783691883087158,
+      "learning_rate": 0.00018202155231209933,
+      "loss": 0.2455,
+      "step": 1475
+    },
+    {
+      "epoch": 0.3682789098944267,
+      "grad_norm": 1.8014737367630005,
+      "learning_rate": 0.00018133951711908335,
+      "loss": 0.2316,
+      "step": 1500
+    },
+    {
+      "epoch": 0.3744168917260005,
+      "grad_norm": 2.4618427753448486,
+      "learning_rate": 0.0001806574819260674,
+      "loss": 0.2578,
+      "step": 1525
+    },
+    {
+      "epoch": 0.3805548735575743,
+      "grad_norm": 1.3078199625015259,
+      "learning_rate": 0.00017997544673305145,
+      "loss": 0.2703,
+      "step": 1550
+    },
+    {
+      "epoch": 0.38669285538914805,
+      "grad_norm": 1.1195226907730103,
+      "learning_rate": 0.00017929341154003547,
+      "loss": 0.2337,
+      "step": 1575
+    },
+    {
+      "epoch": 0.3928308372207218,
+      "grad_norm": 1.3115506172180176,
+      "learning_rate": 0.0001786113763470195,
+      "loss": 0.2339,
+      "step": 1600
+    },
+    {
+      "epoch": 0.3989688190522956,
+      "grad_norm": 2.1288275718688965,
+      "learning_rate": 0.00017792934115400355,
+      "loss": 0.2433,
+      "step": 1625
+    },
+    {
+      "epoch": 0.4051068008838694,
+      "grad_norm": 1.5910248756408691,
+      "learning_rate": 0.0001772473059609876,
+      "loss": 0.2206,
+      "step": 1650
+    },
+    {
+      "epoch": 0.4112447827154432,
+      "grad_norm": 2.3456833362579346,
+      "learning_rate": 0.00017656527076797166,
+      "loss": 0.2334,
+      "step": 1675
+    },
+    {
+      "epoch": 0.41738276454701695,
+      "grad_norm": 1.4813215732574463,
+      "learning_rate": 0.00017588323557495567,
+      "loss": 0.2393,
+      "step": 1700
+    },
+    {
+      "epoch": 0.4235207463785907,
+      "grad_norm": 1.7243605852127075,
+      "learning_rate": 0.0001752012003819397,
+      "loss": 0.2579,
+      "step": 1725
+    },
+    {
+      "epoch": 0.4296587282101645,
+      "grad_norm": 2.249532461166382,
+      "learning_rate": 0.00017451916518892375,
+      "loss": 0.2157,
+      "step": 1750
+    },
+    {
+      "epoch": 0.43579671004173826,
+      "grad_norm": 1.5098496675491333,
+      "learning_rate": 0.0001738371299959078,
+      "loss": 0.245,
+      "step": 1775
+    },
+    {
+      "epoch": 0.4419346918733121,
+      "grad_norm": 2.0736098289489746,
+      "learning_rate": 0.00017315509480289186,
+      "loss": 0.2191,
+      "step": 1800
+    },
+    {
+      "epoch": 0.44807267370488585,
+      "grad_norm": 1.8657655715942383,
+      "learning_rate": 0.00017247305960987587,
+      "loss": 0.2226,
+      "step": 1825
+    },
+    {
+      "epoch": 0.4542106555364596,
+      "grad_norm": 1.1383737325668335,
+      "learning_rate": 0.00017179102441685991,
+      "loss": 0.2265,
+      "step": 1850
+    },
+    {
+      "epoch": 0.4603486373680334,
+      "grad_norm": 1.7321696281433105,
+      "learning_rate": 0.00017110898922384395,
+      "loss": 0.2281,
+      "step": 1875
+    },
+    {
+      "epoch": 0.46648661919960716,
+      "grad_norm": 1.8833884000778198,
+      "learning_rate": 0.000170426954030828,
+      "loss": 0.2051,
+      "step": 1900
+    },
+    {
+      "epoch": 0.4726246010311809,
+      "grad_norm": 1.4870628118515015,
+      "learning_rate": 0.00016974491883781206,
+      "loss": 0.2369,
+      "step": 1925
+    },
+    {
+      "epoch": 0.47876258286275475,
+      "grad_norm": 1.1184924840927124,
+      "learning_rate": 0.00016906288364479608,
+      "loss": 0.239,
+      "step": 1950
+    },
+    {
+      "epoch": 0.4849005646943285,
+      "grad_norm": 1.0997380018234253,
+      "learning_rate": 0.00016838084845178012,
+      "loss": 0.2548,
+      "step": 1975
+    },
+    {
+      "epoch": 0.4910385465259023,
+      "grad_norm": 1.814329743385315,
+      "learning_rate": 0.00016769881325876416,
+      "loss": 0.225,
+      "step": 2000
+    },
+    {
+      "epoch": 0.49717652835747606,
+      "grad_norm": 1.2444835901260376,
+      "learning_rate": 0.0001670167780657482,
+      "loss": 0.2186,
+      "step": 2025
+    },
+    {
+      "epoch": 0.5033145101890498,
+      "grad_norm": 1.6863384246826172,
+      "learning_rate": 0.00016633474287273224,
+      "loss": 0.2492,
+      "step": 2050
+    },
+    {
+      "epoch": 0.5094524920206236,
+      "grad_norm": 2.017697811126709,
+      "learning_rate": 0.00016565270767971628,
+      "loss": 0.2165,
+      "step": 2075
+    },
+    {
+      "epoch": 0.5155904738521974,
+      "grad_norm": 2.237959146499634,
+      "learning_rate": 0.00016497067248670032,
+      "loss": 0.2335,
+      "step": 2100
+    },
+    {
+      "epoch": 0.5217284556837711,
+      "grad_norm": 2.2787888050079346,
+      "learning_rate": 0.00016428863729368436,
+      "loss": 0.2341,
+      "step": 2125
+    },
+    {
+      "epoch": 0.5278664375153449,
+      "grad_norm": 2.273451328277588,
+      "learning_rate": 0.0001636066021006684,
+      "loss": 0.2011,
+      "step": 2150
+    },
+    {
+      "epoch": 0.5340044193469188,
+      "grad_norm": 1.48465895652771,
+      "learning_rate": 0.00016292456690765244,
+      "loss": 0.2562,
+      "step": 2175
+    },
+    {
+      "epoch": 0.5401424011784925,
+      "grad_norm": 1.7666058540344238,
+      "learning_rate": 0.00016224253171463648,
+      "loss": 0.2267,
+      "step": 2200
+    },
+    {
+      "epoch": 0.5462803830100663,
+      "grad_norm": 1.6537222862243652,
+      "learning_rate": 0.00016156049652162052,
+      "loss": 0.2277,
+      "step": 2225
+    },
+    {
+      "epoch": 0.5524183648416401,
+      "grad_norm": 2.039132595062256,
+      "learning_rate": 0.00016087846132860456,
+      "loss": 0.211,
+      "step": 2250
+    },
+    {
+      "epoch": 0.5585563466732139,
+      "grad_norm": 1.686170220375061,
+      "learning_rate": 0.0001601964261355886,
+      "loss": 0.2158,
+      "step": 2275
+    },
+    {
+      "epoch": 0.5646943285047876,
+      "grad_norm": 1.7695778608322144,
+      "learning_rate": 0.00015951439094257264,
+      "loss": 0.2198,
+      "step": 2300
+    },
+    {
+      "epoch": 0.5708323103363614,
+      "grad_norm": 1.7025463581085205,
+      "learning_rate": 0.00015883235574955668,
+      "loss": 0.2085,
+      "step": 2325
+    },
+    {
+      "epoch": 0.5769702921679352,
+      "grad_norm": 1.00222909450531,
+      "learning_rate": 0.00015815032055654073,
+      "loss": 0.2127,
+      "step": 2350
+    },
+    {
+      "epoch": 0.5831082739995089,
+      "grad_norm": 1.2521406412124634,
+      "learning_rate": 0.00015746828536352477,
+      "loss": 0.2309,
+      "step": 2375
+    },
+    {
+      "epoch": 0.5892462558310827,
+      "grad_norm": 1.8515346050262451,
+      "learning_rate": 0.0001567862501705088,
+      "loss": 0.199,
+      "step": 2400
+    },
+    {
+      "epoch": 0.5953842376626565,
+      "grad_norm": 1.2680790424346924,
+      "learning_rate": 0.00015610421497749285,
+      "loss": 0.2297,
+      "step": 2425
+    },
+    {
+      "epoch": 0.6015222194942303,
+      "grad_norm": 1.5662161111831665,
+      "learning_rate": 0.0001554221797844769,
+      "loss": 0.2596,
+      "step": 2450
+    },
+    {
+      "epoch": 0.6076602013258041,
+      "grad_norm": 1.6885185241699219,
+      "learning_rate": 0.00015474014459146093,
+      "loss": 0.2478,
+      "step": 2475
+    },
+    {
+      "epoch": 0.6137981831573779,
+      "grad_norm": 1.6437608003616333,
+      "learning_rate": 0.00015405810939844497,
+      "loss": 0.2358,
+      "step": 2500
+    },
+    {
+      "epoch": 0.6199361649889517,
+      "grad_norm": 1.654118299484253,
+      "learning_rate": 0.000153376074205429,
+      "loss": 0.2331,
+      "step": 2525
+    },
+    {
+      "epoch": 0.6260741468205254,
+      "grad_norm": 1.7273892164230347,
+      "learning_rate": 0.00015269403901241305,
+      "loss": 0.199,
+      "step": 2550
+    },
+    {
+      "epoch": 0.6322121286520992,
+      "grad_norm": 1.2874716520309448,
+      "learning_rate": 0.0001520120038193971,
+      "loss": 0.2511,
+      "step": 2575
+    },
+    {
+      "epoch": 0.638350110483673,
+      "grad_norm": 1.972301721572876,
+      "learning_rate": 0.00015132996862638113,
+      "loss": 0.2291,
+      "step": 2600
+    },
+    {
+      "epoch": 0.6444880923152467,
+      "grad_norm": 1.420000433921814,
+      "learning_rate": 0.00015064793343336517,
+      "loss": 0.2225,
+      "step": 2625
+    },
+    {
+      "epoch": 0.6506260741468205,
+      "grad_norm": 2.5418105125427246,
+      "learning_rate": 0.0001499658982403492,
+      "loss": 0.2231,
+      "step": 2650
+    },
+    {
+      "epoch": 0.6567640559783943,
+      "grad_norm": 1.986648440361023,
+      "learning_rate": 0.00014928386304733323,
+      "loss": 0.2165,
+      "step": 2675
+    },
+    {
+      "epoch": 0.662902037809968,
+      "grad_norm": 1.2217724323272705,
+      "learning_rate": 0.0001486018278543173,
+      "loss": 0.2202,
+      "step": 2700
+    },
+    {
+      "epoch": 0.6690400196415419,
+      "grad_norm": 1.5692222118377686,
+      "learning_rate": 0.00014791979266130133,
+      "loss": 0.2019,
+      "step": 2725
+    },
+    {
+      "epoch": 0.6751780014731157,
+      "grad_norm": 1.9380391836166382,
+      "learning_rate": 0.00014723775746828537,
+      "loss": 0.2195,
+      "step": 2750
+    },
+    {
+      "epoch": 0.6813159833046895,
+      "grad_norm": 1.8485900163650513,
+      "learning_rate": 0.00014655572227526941,
+      "loss": 0.214,
+      "step": 2775
+    },
+    {
+      "epoch": 0.6874539651362632,
+      "grad_norm": 2.325063943862915,
+      "learning_rate": 0.00014587368708225343,
+      "loss": 0.2219,
+      "step": 2800
+    },
+    {
+      "epoch": 0.693591946967837,
+      "grad_norm": 1.4283169507980347,
+      "learning_rate": 0.0001451916518892375,
+      "loss": 0.2241,
+      "step": 2825
+    },
+    {
+      "epoch": 0.6997299287994108,
+      "grad_norm": 1.6000885963439941,
+      "learning_rate": 0.00014450961669622154,
+      "loss": 0.2231,
+      "step": 2850
+    },
+    {
+      "epoch": 0.7058679106309845,
+      "grad_norm": 1.5898454189300537,
+      "learning_rate": 0.00014382758150320558,
+      "loss": 0.2203,
+      "step": 2875
+    },
+    {
+      "epoch": 0.7120058924625583,
+      "grad_norm": 2.032430648803711,
+      "learning_rate": 0.00014314554631018962,
+      "loss": 0.2216,
+      "step": 2900
+    },
+    {
+      "epoch": 0.7181438742941321,
+      "grad_norm": 1.7925983667373657,
+      "learning_rate": 0.00014246351111717363,
+      "loss": 0.2245,
+      "step": 2925
+    },
+    {
+      "epoch": 0.7242818561257058,
+      "grad_norm": 2.068000316619873,
+      "learning_rate": 0.0001417814759241577,
+      "loss": 0.2221,
+      "step": 2950
+    },
+    {
+      "epoch": 0.7304198379572796,
+      "grad_norm": 1.4491972923278809,
+      "learning_rate": 0.00014109944073114174,
+      "loss": 0.2137,
+      "step": 2975
+    },
+    {
+      "epoch": 0.7365578197888534,
+      "grad_norm": 1.2980254888534546,
+      "learning_rate": 0.00014041740553812578,
+      "loss": 0.2082,
+      "step": 3000
+    }
+  ],
+  "logging_steps": 25,
+  "max_steps": 8146,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 1000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.7696405948746138e+17,
+  "train_batch_size": 32,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea5d298f200c14881f05f571df1a23a6f14ae3e3464df47181f658553cdd1108
+size 5624