Jackrong commited on
Commit
7e21bfc
·
verified ·
1 Parent(s): 74e15db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -8
README.md CHANGED
@@ -1,17 +1,75 @@
1
  ---
2
- base_model: meta-llama-3.1-8b-Instruct
 
 
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
  - unsloth
 
7
  - llama
8
- license: apache-2.0
 
 
 
9
  language:
10
  - en
 
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** Jackrong
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** meta-llama-3.1-8b-Instruct
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: meta-llama/Llama-3.1-8B-Instruct
3
+ library_name: transformers
4
+ model_name: GPT-5-Distill-llama3.1-8B-Instruct
5
  tags:
 
 
6
  - unsloth
7
+ - llama-3
8
  - llama
9
+ - text-generation
10
+ - distillation
11
+ - gpt-5
12
+ license: llama3.1
13
  language:
14
  - en
15
+ - zh
16
  ---
17
 
18
+ # GPT-5-Distill-llama3.1-8B-Instruct
19
 
20
+ ![Unsloth](https://img.shields.io/badge/Unsloth-Fine--Tuning-blue?style=flat&logo=unsloth)
21
+ ![Llama-3](https://img.shields.io/badge/Model-Llama--3.1-green?style=flat)
22
+ ![Distillation](https://img.shields.io/badge/Technique-Knowledge%20Distillation-orange?style=flat)
23
+
24
+ ## Model Summary
25
+
26
+
27
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/PNNVeEd1bKdL3F7oXCj5M.png" width="800" />
28
+
29
+ **GPT-5-Distill-llama3.1-8B-Instruct** is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), designed to distill the capabilities of high-performance models (labeled as GPT-5 in source datasets) into a more efficient 8B parameter footprint.
30
+
31
+ This model was trained using **Unsloth** on a curated mix of approximately **164,000 high-quality instruction-response pairs**, focusing on complex reasoning and "normal" flaw-level responses.
32
+
33
+ * **Base Model:** `meta-llama/Llama-3.1-8B-Instruct`
34
+ * **Architecture:** Llama 3.1 (8B parameters)
35
+ * **Language:** English (Primary)
36
+ * **Context Window:** 32,768 tokens
37
+ * **Fine-tuning Framework:** [Unsloth](https://github.com/unslothai/unsloth) (QLoRA)
38
+
39
+ ## ✨ Key Advantages of GPT-5 Distillation
40
+
41
+ This model represents a shift towards **"Super-Knowledge Distillation"**, where a smaller, efficient student model learns from a significantly more capable teacher.
42
+
43
+ * **🚀 Frontier-Level Reasoning**: By training on dataset samples attributed to GPT-5, the model acquires complex reasoning patterns, nuance, and problem-solving strategies that are typically absent in standard datasets or smaller models.
44
+ * **⚡ Efficient Intelligence**: Users can experience high-fidelity, coherent, and detailed responses on consumer hardware (e.g., single GPUs) without the latency, privacy concerns, or cost of querying giant proprietary APIs.
45
+ * **💎 High-Purity Signal**: The strict filtering for `flaw == "normal"` ensures the model is fine-tuned only on the highest confidence, error-free responses. This minimizes "hallucination inheritance" and aligns the model with safe, helpful behaviors.
46
+ * **🎯 Enhanced Nuance & Tone**: Unlike standard finetunes that often sound robotic, this model mimics the more natural, conversational, and adaptive tone found in next-generation frontier models.
47
+
48
+ ## 📚 Training Data
49
+
50
+ The model was trained on a high-quality blend of two datasets, totaling **163,896 samples**:
51
+
52
+ 1. **Chat-GPT-5-Chat-Response (160k samples)**
53
+ * Filtered specifically for normal entries to ensure high-quality, safe, and coherent responses.
54
+ * This dataset serves as the primary distillation source, aiming to mimic the response patterns of advanced large language models.
55
+ 2. **ShareGPT-Qwen3-235B-A22B-Instuct-2507 (3.9k samples)**
56
+ * "This dataset consists of approximately **3.9k examples**, with an average of about **5 rounds of dialogue** per scenario, designed to enhance the model’s instruction-following ability and task-completion efficiency.
57
+
58
+ All data was formatted using the standard **Llama-3 Chat Template**.
59
+
60
+ ## ⚙️ Training Details
61
+
62
+ * **Hardware:** NVIDIA H100
63
+ * **Sequence Length:** 32,768 tokens (Long Context Support)
64
+ * **Batch Size:** 4 per device (Effective Batch Size: 32 via Gradient Accumulation)
65
+ * **Learning Rate:** 2e-5
66
+ * **Scheduler:** Linear
67
+ * **Optimizer:** AdamW 8-bit
68
+ * **LoRA Rank (r):** 32
69
+ * **LoRA Alpha:** 32
70
+ * **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
71
+
72
+ ## 🛡️ License & Limitations
73
+
74
+ * **License:** This model is subject to the **Llama 3.1 Community License**.
75
+ * **Limitations:** While this model is distilled from high-capability sources, it is still an 8B parameter model. It may hallucinate facts or struggle with extremely complex reasoning tasks compared to the original teacher models. The "GPT-5" naming refers to the source dataset labels and does not imply access to unreleased OpenAI weights.