nielsr HF Staff commited on
Commit
44f2a94
·
verified ·
1 Parent(s): 72657c9

Improve model card: Add pipeline tag, library name, paper details, and sample usage

Browse files

This PR significantly enhances the model card for `Agentic-R1` by:

* Adding `pipeline_tag: text-generation` to correctly categorize the model for text generation and reasoning tasks.
* Specifying `library_name: transformers` to enable the automated "How to use" widget, providing a convenient code snippet for users. This is supported by the model's `Qwen2ForCausalLM` architecture and `chat_template` in the `tokenizer_config.json`.
* Including the paper title and a direct link to the Hugging Face paper page: [Agentic-R1: Distilled Dual-Strategy Reasoning](https://huggingface.co/papers/2507.05707).
* Adding a link to the official GitHub repository for more details and code.
* Incorporating key features, dataset information, performance highlights, installation instructions, and a practical `transformers` sample usage snippet, making it easier for users to get started.

These changes will improve discoverability and usability for researchers and practitioners on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +146 -4
README.md CHANGED
@@ -1,9 +1,151 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - VanishD/DualDistill
5
  language:
6
  - en
7
- base_model:
8
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
4
  datasets:
5
  - VanishD/DualDistill
6
  language:
7
  - en
8
+ license: mit
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ ---
12
+
13
+ # Agentic-R1: Distilled Dual-Strategy Reasoning
14
+
15
+ This repository hosts the **Agentic-R1** model, an implementation of the paper [**Agentic-R1: Distilled Dual-Strategy Reasoning**](https://huggingface.co/papers/2507.05707).
16
+
17
+ **Code**: https://github.com/StigLidu/DualDistill
18
+
19
+ ## Abstract
20
+
21
+ Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning.
22
+
23
+ ## Key Features
24
+
25
+ - **Efficient Training**: Integrates tool use into long-chain-of-thought (CoT) reasoning using only 4 × A6000 GPUs
26
+ - **Unified Reasoning**: Fuses heterogeneous reasoning traces from multiple teacher models into a single student model
27
+
28
+ <div align="center">
29
+ <img src="https://github.com/StigLidu/DualDistill/raw/main/fig/overview.png" alt="Overview of DualDistill" width="500">
30
+ <p><em>Overview of DualDistill methodology</em></p>
31
+ </div>
32
+
33
+ ## Datasets
34
+
35
+ | Dataset | Description | Link |
36
+ | :------------ | :-------------------------------------------- | :--------------------------------------------------- |
37
+ | **Training Set** | Complete training dataset with teacher trajectories | [🤗 HuggingFace](https://huggingface.co/datasets/VanishD/DualDistill) |
38
+ | **Test Set** | Evaluation benchmarks | `dataset/test/` |
39
+
40
+ ## Results
41
+
42
+ <div align="center">
43
+ <img src="https://github.com/StigLidu/DualDistill/raw/main/fig/result.png" alt="Performance comparison of Agentic-R1 models" width="700">
44
+ </div>
45
+
46
+ - **Agentic-R1** demonstrates significant performance gains on **DeepMath-L** and **Combinatorics300**, where both complex reasoning and tool use are crucial for success.
47
+ - **Agentic-R1-SD** (Self-Distilled) further enhances performance through our self-distillation approach, consistently outperforming baseline models across nearly all evaluation tasks.
48
+
49
+ ## Quick Start
50
+
51
+ ### Installation
52
+
53
+ 1. **Clone the repository**:
54
+ ```bash
55
+ git clone https://github.com/StigLidu/DualDistill.git
56
+ cd DualDistill
57
+ ```
58
+
59
+ 2. **Create environment** (optional but recommended):
60
+ ```bash
61
+ conda create -n dualdistill python=3.11
62
+ conda activate dualdistill
63
+ ```
64
+
65
+ 3. **Install dependencies**:
66
+ ```bash
67
+ pip install -r requirements.txt
68
+ pip install flash-attn --no-build-isolation
69
+ ```
70
+
71
+ ### Sample Usage
72
+
73
+ Here's how to perform inference with the `Agentic-R1` model using the Hugging Face `transformers` library:
74
+
75
+ ```python
76
+ import torch
77
+ from transformers import AutoTokenizer, AutoModelForCausalLM
78
+
79
+ model_id = "VanishD/Agentic-R1" # Or "VanishD/Agentic-R1-SD" for the self-distilled version
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
82
+ model = AutoModelForCausalLM.from_pretrained(
83
+ model_id,
84
+ torch_dtype=torch.bfloat16, # Use bfloat16 for better performance and memory if supported
85
+ device_map="auto",
86
+ trust_remote_code=True
87
+ ).eval() # Set model to evaluation mode
88
+
89
+ # Prepare a simple user message
90
+ messages = [{"role": "user", "content": "What is 123 + 456?"}]
91
+
92
+ # Apply the chat template to format the prompt correctly for the model
93
+ # The `add_generation_prompt=True` adds the Assistant token to prompt the model for its response.
94
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
95
+
96
+ # Encode the prompt
97
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
98
+
99
+ # Generate response
100
+ output_ids = model.generate(
101
+ input_ids,
102
+ max_new_tokens=256,
103
+ do_sample=True,
104
+ temperature=0.7,
105
+ top_p=0.95,
106
+ eos_token_id=tokenizer.eos_token_id,
107
+ pad_token_id=tokenizer.pad_token_id, # Often EOS token is used as PAD token for LLMs
108
+ )
109
+
110
+ # Decode and print the generated text, excluding the input prompt
111
+ generated_text = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True).strip()
112
+ print(f"Generated Text:
113
+ {generated_text}")
114
+ ```
115
+
116
+ ## ⚠️ Important Notes
117
+
118
+ - **Code Execution Safety**: The evaluation scripts execute model-generated code locally. Only use trusted models before execution.
119
+ - **Inference Config**: If you are using vLLM (a recent version) and encounter an error regarding the maximum context length. You may need to modify the `model_max_length` in `tokenizer_config.json`.
120
+ - **Self-Distillation Warning**: The self-distillation step requires sampling many trajectories and can be time-consuming.
121
+
122
+ ## License
123
+
124
+ This project is licensed under the MIT License - see the [LICENSE](https://github.com/StigLidu/DualDistill/blob/main/LICENSE) file for details.
125
+
126
+ ## Acknowledgments
127
+
128
+ We thank the following open-source projects for their foundational contributions:
129
+
130
+ - [OpenHands](https://github.com/All-Hands-AI/OpenHands) - Agent framework
131
+ - [DeepMath-103K](https://huggingface.co/datasets/zwhe99/DeepMath-103K) - Mathematical reasoning dataset
132
+ - [vLLM](https://github.com/vllm-project/vllm) - High-performance inference engine
133
+
134
+ ## Contact
135
+
136
+ For questions or support, please contact:
137
+
138
+ - **Weihua Du**: [weihuad@cs.cmu.edu](mailto:weihuad@cs.cmu.edu)
139
+
140
+ ## Citation
141
+
142
+ If you find our work useful, please consider citing:
143
+
144
+ ```bibtex
145
+ @article{du2025agentic,
146
+ title={Agentic-R1: Distilled Dual-Strategy Reasoning},
147
+ author={Du, Weihua and Aggarwal, Pranjal and Welleck, Sean and Yang, Yiming},
148
+ journal={arXiv preprint arXiv:2507.05707},
149
+ year={2025}
150
+ }
151
+ ```