walkerrr commited on
Commit
5f410b5
·
verified ·
1 Parent(s): 1682021

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -21,6 +21,24 @@ We introduce TianJiangZhuG_3B, an advanced multimodal large language model (MLLM
21
 
22
  Key Enhancements:
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
 
26
  Evaluation:
 
21
 
22
  Key Enhancements:
23
 
24
+ 1. Meticulous Construction of High-Quality Chain-of-Thought (CoT) Datasets
25
+
26
+ Scale and Coverage: We have systematically built thousands of high-quality Chinese and English reasoning data across multiple domains such as mathematical applications, logical reasoning, and symbolic operations. This ensures the model’s generalization ability in diverse scenarios.
27
+
28
+ Data Generation Method: Based on selected image-text question-answer pairs, combined with the "Super Chain-of-Thought Model", we automatically generate Chain-of-Thought annotated data containing detailed reasoning paths. This method effectively enhances the model’s step-by-step reasoning and logical coherence
29
+
30
+ 3. Multi-Stage GRPO Training Algorithm
31
+
32
+ Progressive Learning Mechanism: We innovatively propose a multi-stage GRPO (Gradient-based Reward Policy Optimization) training process. Through task design that progresses from shallow to deep and from simple to complex, we guide the model to achieve stepwise capability evolution:
33
+
34
+ Primary Stage: Focus on judgment and classification tasks to strengthen the model’s understanding of problem structures and basic logic.
35
+
36
+ Intermediate Stage: Introduce multiple-choice and matching questions to improve the model’s ability to identify key information among distractors.
37
+
38
+ Advanced Stage: Expand to open-ended generation tasks to encourage the model to conduct free deduction and complete logical expression.
39
+
40
+ Algorithm Advantages: This training strategy effectively reduces the model’s learning difficulty in complex tasks, improves training stability and strategy convergence efficiency, while significantly enhancing the model’s adaptability across different task types.
41
+
42
 
43
 
44
  Evaluation: