jasonqiu commited on
Commit
73521f8
·
verified ·
1 Parent(s): 2cd0ded

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +174 -0
README.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: transformers
4
+ tags:
5
+ - reinforcement-learning
6
+ - llm-routing
7
+ - cost-optimization
8
+ - tool-calling
9
+ - multi-model
10
+ pipeline_tag: text-generation
11
+ ---
12
+
13
+ # xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning
14
+
15
+ <div align="center">
16
+
17
+ [![arXiv](https://img.shields.io/badge/arXiv-2510.08439-b31b1b.svg)](https://arxiv.org/abs/2510.08439)
18
+ [![GitHub](https://img.shields.io/badge/GitHub-SalesforceAIResearch%2FxRouter-blue?logo=github)](https://github.com/SalesforceAIResearch/xRouter)
19
+ [![License](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
20
+
21
+ **[Paper](https://arxiv.org/abs/2510.08439)** · **[GitHub Repository](https://github.com/SalesforceAIResearch/xRouter)** · **[Model](https://huggingface.co/Salesforce/xRouter)**
22
+
23
+ </div>
24
+
25
+ Welcome to **xRouter**, Salesforce AI Research's intelligent LLM routing system trained with reinforcement learning to dynamically select optimal models from 20+ available LLMs while optimizing for both performance and cost.
26
+
27
+ Modern LLM deployments face a widening cost-performance spectrum: premium models deliver strong reasoning but are expensive, while lightweight models are economical yet brittle on complex tasks. **xRouter** learns end-to-end routing policies that balance quality and cost through explicit cost-aware reward shaping, eliminating the need for hand-engineered routing rules.
28
+
29
+ ## ⭐ Highlights
30
+
31
+ - **Cost-Aware Optimization**: RL-trained policies minimize costs (up to 60% reduction) while maintaining quality
32
+ - **Adaptive Routing**: Dynamic model selection based on query complexity - routes simple queries to budget models, complex ones to premium models
33
+ - **Tool-Calling Architecture**: Learns to effectively invoke 20+ models (GPT-5, o3/o4, DeepSeek R1, Qwen3, Kimi K2, etc.) and select best responses
34
+ - **Multi-Model Orchestration**: Coordinates responses from multiple LLMs for complex reasoning tasks
35
+ - **Learned Prompt Engineering**: Automatically generates optimized system prompts for target models
36
+
37
+ ---
38
+
39
+ ## 📊 Model Details
40
+
41
+ - **Developed by**: Salesforce AI Research
42
+ - **Base Model**: Qwen/Qwen2.5-7B-Instruct
43
+ - **Model Type**: Instruction-tuned language model with tool-calling capabilities
44
+ - **Training Algorithm**: DAPO (Distributional Advantage Policy Optimization) with cost-aware reward shaping
45
+ - **Training Data**: Derived from [Reasoning360](https://github.com/LLM360/Reasoning360) - math, code, reasoning, and STEM tasks
46
+ - **License**: CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International)
47
+
48
+ ## 📈 Key Results
49
+
50
+ - **Substantial cost reductions** (up to 60%) at comparable task completion rates
51
+ - Evaluated on **17 diverse benchmarks** spanning math, coding, reasoning, and OOD tasks
52
+ - **Adaptive behavior**: Learns when to use premium vs. budget models without explicit rules
53
+ - **Multi-turn reasoning**: Effectively coordinates multiple model calls for complex tasks
54
+
55
+ For detailed results, see our [paper](https://arxiv.org/abs/2510.08439).
56
+
57
+ ## 🛠️ Usage
58
+
59
+ ### Installation
60
+
61
+ ```bash
62
+ # Clone the repository
63
+ git clone https://github.com/SalesforceAIResearch/xRouter.git
64
+ cd xRouter
65
+
66
+ # Set up environment
67
+ conda create -n xrouter python=3.12
68
+ conda activate xrouter
69
+
70
+ pip install uv
71
+ uv pip install torch==2.6.0
72
+ uv pip install flash-attn==2.7.3 --no-build-isolation
73
+ uv pip install -e .[gpu,math,vllm,test]
74
+ pip install litellm rich python-dotenv
75
+ ```
76
+
77
+ ### Configure API Keys
78
+
79
+ ```bash
80
+ export OPENAI_API_KEY="your_openai_key"
81
+ export TOGETHER_API_KEY="your_together_key"
82
+ export GEMINI_API_KEY="your_gemini_key" # optional
83
+ ```
84
+
85
+ ### 🚀 Deployment
86
+
87
+ ```bash
88
+ # Host the router model
89
+ cd evaluation
90
+ bash host_router.sh # Serves on port 8000
91
+
92
+ # Launch the router API (in another terminal)
93
+ bash serve_router.sh # Serves on port 8800
94
+ ```
95
+
96
+ ### 💬 Usage Example
97
+
98
+ ```python
99
+ import openai
100
+
101
+ # Initialize client
102
+ client = openai.OpenAI(
103
+ base_url="http://localhost:8800/v1",
104
+ api_key="dummy"
105
+ )
106
+
107
+ # Send request
108
+ response = client.chat.completions.create(
109
+ model="router-tool-rl",
110
+ messages=[
111
+ {"role": "user", "content": "Solve: If x^2 + 2x + 1 = 0, what are the values of x?"}
112
+ ],
113
+ max_tokens=1000
114
+ )
115
+
116
+ print(response.choices[0].message.content)
117
+
118
+ # Access routing metadata
119
+ metadata = response.router_metadata
120
+ print(f"Model used: {metadata['model_used']}")
121
+ print(f"Total cost: ${metadata['total_cost']:.6f}")
122
+ ```
123
+
124
+ ## 🎓 Training Methodology
125
+
126
+ xRouter uses **DAPO** (Distributional Advantage Policy Optimization) with cost-aware reward shaping:
127
+
128
+ ```
129
+ reward = quality - λ × normalized_cost
130
+ ```
131
+
132
+ **Training Features**:
133
+ - Cost-aware rewards penalize expensive routing decisions
134
+ - Multi-turn credit assignment across conversation turns
135
+ - Tool augmentation with 20+ model tools + response selection
136
+ - Curriculum learning from simple to complex tasks
137
+
138
+ **Supported Model Tiers**:
139
+
140
+ | Tier | Models | Best For |
141
+ |------|--------|----------|
142
+ | **Premium** | GPT-5, GPT-4.1, o3, Qwen3-235B-Instruct, Kimi K2 | Mission-critical tasks |
143
+ | **Standard** | GPT-5-Mini, GPT-4.1-Mini, o4-Mini, GPT-OSS-120B | Balanced performance |
144
+ | **Budget** | GPT-5-Nano, GPT-4.1-Nano, GPT-4o-Mini, GPT-OSS-20B | High-volume tasks |
145
+ | **Specialized** | o3, DeepSeek-R1, Qwen3-235B-Thinking, Qwen3-Coder-480B | Domain-specific |
146
+
147
+ ## 📚 Citation
148
+
149
+ ```bibtex
150
+ @article{qian2025xrouter,
151
+ title={xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning},
152
+ author={Qian, Cheng and Liu, Zuxin and Kokane, Shirley and Prabhakar, Akshara and Qiu, Jielin and Chen, Haolin and Liu, Zhiwei and Ji, Heng and Yao, Weiran and Heinecke, Shelby and Savarese, Silvio and Xiong, Caiming and Wang, Huan},
153
+ journal={arXiv preprint arXiv:2510.08439},
154
+ year={2025}
155
+ }
156
+ ```
157
+
158
+ ## 🔗 Resources
159
+
160
+ - 📄 **Paper**: [arXiv:2510.08439](https://arxiv.org/abs/2510.08439)
161
+ - 💻 **Code Repository**: [github.com/SalesforceAIResearch/xRouter](https://github.com/SalesforceAIResearch/xRouter)
162
+ - 🤗 **Model Hub**: [Salesforce/xRouter](https://huggingface.co/Salesforce/xRouter)
163
+
164
+ ## 🙏 Acknowledgements
165
+
166
+ This project builds upon exceptional work from the open-source community:
167
+ - **[Reasoning360](https://github.com/LLM360/Reasoning360)**: Foundational RL training framework
168
+ - **[VERL](https://github.com/volcengine/verl)**: RL infrastructure for distributed LLM training
169
+ - **[SGLang](https://github.com/sgl-project/sglang)**: High-performance LLM serving backend
170
+ - **[LiteLLM](https://github.com/BerriAI/litellm)**: Unified API interface for 20+ LLM providers
171
+
172
+ ---
173
+
174
+ _🏢 Developed by Salesforce AI Research_