Update README.md
Browse files
README.md
CHANGED
|
@@ -2,4 +2,106 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- seeklhy/OmniSQL-32B
|
| 5 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- seeklhy/OmniSQL-32B
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
## Important Links
|
| 9 |
+
|
| 10 |
+
[](https://arxiv.org/abs/2509.24403)
|
| 11 |
+
[](https://github.com/antgroup/Agentar-Scale-SQL)
|
| 12 |
+
[](https://bird-bench.github.io/)
|
| 13 |
+
[](https://huggingface.co/collections/antgroup/agentar-scale-sql)
|
| 14 |
+
[](https://modelscope.cn/collections/Agentar-Scale-SQL-0c368e98f73f41)
|
| 15 |
+
|
| 16 |
+
## Introduction
|
| 17 |
+
|
| 18 |
+
We are excited to release the **Agentar-Scale-SQL-Generation-32B**, the core **Reasoning SQL Generator** used in our SOTA framework, **Agentar-Scale-SQL**. Our framework achieved **81.67% execution accuracy** on the challenging BIRD benchmark, ranking first on the official leaderboard.
|
| 19 |
+
|
| 20 |
+
This model is a key component of our "Orchestrated Test-Time Scaling" strategy and has several key features:
|
| 21 |
+
|
| 22 |
+
- **Base Model:** It is fine-tuned from `Omni-SQL-32B`.
|
| 23 |
+
- **RL-Enhanced Reasoning:** The model was further trained using an execution-grounded **Reinforcement Learning** framework (GRPO) to enhance its intrinsic reasoning capabilities.
|
| 24 |
+
- **Deep Reasoning:** It is engineered to conduct deep, step-by-step reasoning and construct complex, high-accuracy SQL queries.
|
| 25 |
+
|
| 26 |
+
This model is one of the two main generators in the `Agentar-Scale-SQL` framework's "Diverse Synthesis" step, working in parallel with an ICL generator to produce a robust pool of SQL candidates.
|
| 27 |
+
|
| 28 |
+
## Model Downloads
|
| 29 |
+
|
| 30 |
+
| **Model** | **Role** |
|
| 31 |
+
|-----------------------------------|----------------|
|
| 32 |
+
| **Agentar-Scale-SQL-Generation-32B** | **SQL Generator** |
|
| 33 |
+
| Agentar-Scale-SQL-Selection-32B | SQL Selector |
|
| 34 |
+
|
| 35 |
+
## Performance
|
| 36 |
+
|
| 37 |
+
The performance metrics below reflect the **entire Agentar-Scale-SQL framework**, which uses this Generation model as a key component. The results demonstrate our SOTA performance on the BIRD benchmark.
|
| 38 |
+
|
| 39 |
+
| Methods | EX (Dev) | **EX (Test)** | R-VES (%) |
|
| 40 |
+
|:-----------------------------|:---:|:---:|:---------:|
|
| 41 |
+
| **Agentar-Scale-SQL (Ours)** | **74.90** | **81.67** | **77.00** |
|
| 42 |
+
| AskData + GPT-4o | 76.14 | 80.88 | 76.24 |
|
| 43 |
+
| LongData-SQL | 74.32 | 77.53 | 71.89 |
|
| 44 |
+
| CHASE-SQL + Gemini | 74.90 | 76.02 | 69.94 |
|
| 45 |
+
| JoyDataAgent-SQL | 74.25 | 75.74 | 70.16 |
|
| 46 |
+
| TCDataAgent-SQL | 74.12 | 75.74 | - |
|
| 47 |
+
| Contextual-SQL | 73.50 | 75.63 | 70.02 |
|
| 48 |
+
| XiYan-SQL | 73.34 | 75.63 | 71.41 |
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
## Prompt Template
|
| 52 |
+
|
| 53 |
+
````python
|
| 54 |
+
PROMPT_TEMPLATE = """Task Overview:
|
| 55 |
+
You are a data science expert. Below, you are provided with a database schema and a natural language question. Your task is to understand the schema and generate a valid SQL query to answer the question.
|
| 56 |
+
|
| 57 |
+
Database Engine:
|
| 58 |
+
{{ dialect }}
|
| 59 |
+
|
| 60 |
+
Database Schema:
|
| 61 |
+
{{ db_schemas }}
|
| 62 |
+
This schema describes the database's structure, including tables, columns, primary keys, foreign keys, and any relevant relationships or constraints.
|
| 63 |
+
{% if matched_contents %}
|
| 64 |
+
Matched contents:
|
| 65 |
+
{{ matched_contents }}
|
| 66 |
+
Matched contents presents values related to the question, together with their source table and column, for your reference in SQL generation.
|
| 67 |
+
{% endif %}
|
| 68 |
+
Question:
|
| 69 |
+
{%- if hint %}
|
| 70 |
+
{{ hint }}
|
| 71 |
+
{{ question }}
|
| 72 |
+
{%- else %}
|
| 73 |
+
{{ question }}
|
| 74 |
+
{%- endif %}
|
| 75 |
+
|
| 76 |
+
Instructions:
|
| 77 |
+
- If Matched contents is provided, you can use it as reference when generating the SQL query.
|
| 78 |
+
- Make sure you only output the information that is asked in the question. If the question asks for a specific column, make sure to only include that column in the SELECT clause, nothing more.
|
| 79 |
+
- The generated query should return all of the information asked in the question without any missing or extra information.
|
| 80 |
+
- Before generating the final SQL query, please think through the steps of how to write the query.
|
| 81 |
+
|
| 82 |
+
Output Format:
|
| 83 |
+
In your answer, please enclose the generated SQL query in a code block:
|
| 84 |
+
```sql
|
| 85 |
+
-- Your SQL query
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
Take a deep breath and think step by step to find the correct SQL query.
|
| 89 |
+
"""
|
| 90 |
+
````
|
| 91 |
+
|
| 92 |
+
## Acknowledgments
|
| 93 |
+
|
| 94 |
+
If you find our work useful, please cite the Agentar-Scale-SQL paper:
|
| 95 |
+
|
| 96 |
+
```bibtex
|
| 97 |
+
@misc{wang2025agentarscalesqladvancingtexttosqlorchestrated,
|
| 98 |
+
title={Agentar-Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling},
|
| 99 |
+
author={Pengfei Wang and Baolin Sun and Xuemei Dong and Yaxun Dai and Hongwei Yuan and Mengdie Chu and Yingqi Gao and Xiang Qi and Peng Zhang and Ying Yan},
|
| 100 |
+
year={2025},
|
| 101 |
+
eprint={2509.24403},
|
| 102 |
+
archivePrefix={arXiv},
|
| 103 |
+
primaryClass={cs.CL},
|
| 104 |
+
url={https://arxiv.org/abs/2509.24403},
|
| 105 |
+
}
|
| 106 |
+
```
|
| 107 |
+
|