TFJiangxiLab
/

TianJiangZhuGe_3B

Model card Files Files and versions

walkerrr commited on 27 days ago

Commit

5f410b5

·

verified ·

1 Parent(s): 1682021

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -21,6 +21,24 @@ We introduce TianJiangZhuG_3B, an advanced multimodal large language model (MLLM
 Key Enhancements:
 Evaluation:

 Key Enhancements:
+1. Meticulous Construction of High-Quality Chain-of-Thought (CoT) Datasets
+   Scale and Coverage: We have systematically built thousands of high-quality Chinese and English reasoning data across multiple domains such as mathematical applications, logical reasoning, and symbolic operations. This ensures the model’s generalization ability in diverse scenarios.
+   Data Generation Method: Based on selected image-text question-answer pairs, combined with the "Super Chain-of-Thought Model", we automatically generate Chain-of-Thought annotated data containing detailed reasoning paths. This method effectively enhances the model’s step-by-step reasoning and logical coherence
+3. Multi-Stage GRPO Training Algorithm
+    Progressive Learning Mechanism: We innovatively propose a multi-stage GRPO (Gradient-based Reward Policy Optimization) training process. Through task design that progresses from shallow to deep and from simple to complex, we guide the model to achieve stepwise capability evolution:
+       Primary Stage: Focus on judgment and classification tasks to strengthen the model’s understanding of problem structures and basic logic.
+       Intermediate Stage: Introduce multiple-choice and matching questions to improve the model’s ability to identify key information among distractors.
+       Advanced Stage: Expand to open-ended generation tasks to encourage the model to conduct free deduction and complete logical expression.
+    Algorithm Advantages: This training strategy effectively reduces the model’s learning difficulty in complex tasks, improves training stability and strategy convergence efficiency, while significantly enhancing the model’s adaptability across different task types.
 Evaluation: