nielsr HF Staff commited on
Commit
ce98c63
·
verified ·
1 Parent(s): 387d5b8

Improve model card for Agentic-R1

Browse files

This PR significantly improves the model card for `VanishD/Agentic-R1` by:

- Adding the `pipeline_tag: text-generation` to ensure the model appears in relevant searches on the Hub.
- Adding `library_name: transformers` to enable the automated "How to use" code snippet, as the model is compatible with the `transformers` library (evidenced by `config.json` and typical Hugging Face model structure).
- Adding descriptive `tags` such as `qwen2`, `reasoning`, `tool-use`, and `llm` for better discoverability.
- Including a direct link to the paper: [Agentic-R1: Distilled Dual-Strategy Reasoning](https://huggingface.co/papers/2507.05707).
- Adding a link to the GitHub repository: `https://github.com/StigLidu/DualDistill`.
- Replacing the placeholder content with a comprehensive description derived from the paper abstract and the GitHub README, including:
- The paper abstract.
- Key Features.
- Datasets.
- Performance Results.
- Detailed Quick Start instructions (Installation, Inference Server, and Evaluation scripts) directly from the GitHub README.
- Information on Trained Models.
- Important Notes, License, Acknowledgments, and Contact details.
- The academic citation.

*Note*: A Python inference code snippet was not added to the "Quick Start" section as it was not explicitly found in the provided GitHub README, in strict adherence to the instructions.

Files changed (1) hide show
  1. README.md +140 -4
README.md CHANGED
@@ -1,7 +1,143 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
  base_model:
6
  - VanishD/Agentic-R1
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - VanishD/Agentic-R1
4
+ language:
5
+ - en
6
+ license: mit
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - qwen2
11
+ - reasoning
12
+ - tool-use
13
+ - llm
14
+ ---
15
+
16
+ # Agentic-R1: Distilled Dual-Strategy Reasoning
17
+
18
+ The model was presented in the paper [Agentic-R1: Distilled Dual-Strategy Reasoning](https://huggingface.co/papers/2507.05707).
19
+
20
+ Code: https://github.com/StigLidu/DualDistill
21
+
22
+ ## Abstract
23
+
24
+ Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning.
25
+
26
+ ## Key Features
27
+
28
+ - **Efficient Training**: Integrates tool use into long-chain-of-thought (CoT) reasoning using only 4 × A6000 GPUs
29
+ - **Unified Reasoning**: Fuses heterogeneous reasoning traces from multiple teacher models into a single student model
30
+
31
+ <div align="center">
32
+ <img src="https://raw.githubusercontent.com/StigLidu/DualDistill/main/fig/overview.png" alt="Overview of DualDistill methodology" width="500">
33
+ <p><em>Overview of DualDistill methodology</em></p>
34
+ </div>
35
+
36
+ ## Datasets
37
+
38
+ | Dataset | Description | Link |
39
+ |---------|-------------|------|
40
+ | **Training Set** | Complete training dataset with teacher trajectories | [🤗 HuggingFace](https://huggingface.co/datasets/VanishD/DualDistill) |
41
+ | **Test Set** | Evaluation benchmarks | `dataset/test/` |
42
+
43
+ ## Results
44
+
45
+ <div align="center">
46
+ <img src="https://raw.githubusercontent.com/StigLidu/DualDistill/main/fig/result.png" alt="Performance comparison of Agentic-R1 models" width="700">
47
+ </div>
48
+
49
+ - **Agentic-R1** demonstrates significant performance gains on **DeepMath-L** and **Combinatorics300**, where both complex reasoning and tool use are crucial for success.
50
+ - **Agentic-R1-SD** (Self-Distilled) further enhances performance through our self-distillation approach, consistently outperforming baseline models across nearly all evaluation tasks.
51
+
52
+ ## Quick Start
53
+
54
+ ### Installation
55
+
56
+ 1. **Clone the repository**:
57
+ ```bash
58
+ git clone https://github.com/StigLidu/DualDistill.git
59
+ cd DualDistill
60
+ ```
61
+
62
+ 2. **Create environment** (optional but recommended):
63
+ ```bash
64
+ conda create -n dualdistill python=3.11
65
+ conda activate dualdistill
66
+ ```
67
+
68
+ 3. **Install dependencies**:
69
+ ```bash
70
+ pip install -r requirements.txt
71
+ pip install flash-attn --no-build-isolation
72
+ ```
73
+
74
+ ### Inference Server and Evaluation
75
+
76
+ To run inference and evaluation using the provided scripts:
77
+
78
+ 1. **Start inference server**:
79
+ ```bash
80
+ bash script/eval_script/start_inference_server.sh [model_path] [display_name] [port]
81
+ ```
82
+
83
+ 2. **Run Evaluation**:
84
+ ```bash
85
+ bash script/eval_script/eval_remote_server.sh \
86
+ [url] [display_name] [data_path] [code_mode] [max_token]
87
+ ```
88
+
89
+ **Example**:
90
+ ```bash
91
+ bash script/eval_script/eval_remote_server.sh \
92
+ "http://localhost:8080/v1" "agentic-r1" "dataset/test/math.json" "true" "4096"
93
+ ```
94
+
95
+ ## Trained Models
96
+
97
+ | Model | Description | HuggingFace Link |
98
+ |-------|-------------|------------------|
99
+ | **Agentic-R1-7B** | Base model with teacher distillation | [🤗 Download](https://huggingface.co/VanishD/Agentic-R1) |
100
+ | **Agentic-R1-7B-SD** | Enhanced model with self-distillation | [🤗 Download](https://huggingface.co/VanishD/Agentic-R1-SD) |
101
+
102
+ ## ⚠️ Important Notes
103
+
104
+ - **Code Execution Safety**: The evaluation scripts execute model-generated code locally. Only use trusted models before execution.
105
+ - **Inference Config**: If you are using vLLM (a recent version) and encounter an error regarding the maximum context length. You may need to modify the `model_max_length` in `tokenizer_config.json`.
106
+ - **Self-Distillation Warning**: The self-distillation step requires sampling many trajectories and can be time-consuming.
107
+
108
+ ## License
109
+
110
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
111
+
112
+ ## Acknowledgments
113
+
114
+ We thank the following open-source projects for their foundational contributions:
115
+
116
+ - [OpenHands](https://github.com/All-Hands-AI/OpenHands) - Agent framework
117
+ - [DeepMath-103K](https://huggingface.co/datasets/zwhe99/DeepMath-103K) - Mathematical reasoning dataset
118
+ - [vLLM](https://github.com/vllm-project/vllm) - High-performance inference engine
119
+
120
+ ## Contact
121
+
122
+ For questions or support, please contact:
123
+
124
+ - **Weihua Du**: [weihuad@cs.cmu.edu](mailto:weihuad@cs.cmu.edu)
125
+
126
+ ## Citation
127
+
128
+ If you find our work useful, please consider citing:
129
+
130
+ ```bibtex
131
+ @article{du2025agentic,
132
+ title={Agentic-R1: Distilled Dual-Strategy Reasoning},
133
+ author={Du, Weihua and Aggarwal, Pranjal and Welleck, Sean and Yang, Yiming},
134
+ journal={arXiv preprint arXiv:2507.05707},
135
+ year={2025}
136
+ }
137
+ ```
138
+
139
+ ---
140
+
141
+ <div align="center">
142
+ <p>⭐ Star us on GitHub if this project helped you!</p>
143
+ </div>