stargazerzj commited on
Commit
8b8bba7
·
verified ·
1 Parent(s): 16316c9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +18 -6
README.md CHANGED
@@ -19,7 +19,7 @@ library_name: transformers
19
  <div align="center">
20
 
21
  [![Paper](https://img.shields.io/badge/Paper-PDF-1f6feb.svg)](https://github.com/GAIR-NLP/daVinci-Dev/blob/main/daVinci-Dev.pdf)
22
- [![arXiv](https://img.shields.io/badge/arXiv-Coming_Soon-b31b1b.svg)](https://arxiv.org/pdf/)
23
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-green)](https://github.com/GAIR-NLP/daVinci-Dev)
24
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/GAIR/daVinci-Dev)
25
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/GAIR/daVinci-Dev-72B)
@@ -53,8 +53,8 @@ This work presents a systematic study of **agentic mid-training** and introduces
53
 
54
  Our training uses two complementary trajectory types (details in the paper):
55
 
56
- - **Contextually-native trajectories $\mathcal{D}^{\text{ctx}}_{\text{py}}$ (PR-derived):** preserve the full information flow by bundling file discovery/context retrieval together with sequential edits. This provides broad coverage and diversity.
57
- - **Environmentally-native trajectories $\mathcal{D}^{\text{env}}_{\text{pass}}$ (executable rollouts):** collected from real executable repositories with genuine tool/test outputs, capturing authentic feedback loops.
58
 
59
  Resources (open-source / open-release):
60
 
@@ -95,11 +95,11 @@ We will open-source our datasets through Hugging Face:
95
 
96
  ## Pipeline
97
 
98
- The GitHub repository contains a high-performance pipeline that calls the GitHub API and constructs the structured PR representation used to build $\mathcal{D}^{\text{ctx}}_{\text{py}}$.
99
 
100
  | Pipeline | Description | Link |
101
  |----------|---------|-------------|
102
- | daVinci-Dev Pipeline | a high-performance pipeline used to build $\mathcal{D}^{\text{ctx}}_{\text{py}}$ | [`GAIR-NLP/daVinci-Dev`](https://github.com/GAIR-NLP/daVinci-Dev) |
103
 
104
  ## Quick Start
105
 
@@ -204,4 +204,16 @@ Users are responsible for ensuring their downstream usage complies with the lice
204
 
205
  ## Citation
206
 
207
- ArXiv link and the official citation block are coming soon (the manuscript is under review at the time of release).
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  <div align="center">
20
 
21
  [![Paper](https://img.shields.io/badge/Paper-PDF-1f6feb.svg)](https://github.com/GAIR-NLP/daVinci-Dev/blob/main/daVinci-Dev.pdf)
22
+ [![arXiv](https://img.shields.io/badge/arXiv-2601.18418-b31b1b.svg)](https://arxiv.org/pdf/2601.18418)
23
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-green)](https://github.com/GAIR-NLP/daVinci-Dev)
24
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/GAIR/daVinci-Dev)
25
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/GAIR/daVinci-Dev-72B)
 
53
 
54
  Our training uses two complementary trajectory types (details in the paper):
55
 
56
+ - **Contextually-native trajectories \\(\mathcal{D}^{\text{ctx}}_{\text{py}}\\) (PR-derived):** preserve the full information flow by bundling file discovery/context retrieval together with sequential edits. This provides broad coverage and diversity.
57
+ - **Environmentally-native trajectories \\(\mathcal{D}^{\text{env}}_{\text{pass}}\\) (executable rollouts):** collected from real executable repositories with genuine tool/test outputs, capturing authentic feedback loops.
58
 
59
  Resources (open-source / open-release):
60
 
 
95
 
96
  ## Pipeline
97
 
98
+ The GitHub repository contains a high-performance pipeline that calls the GitHub API and constructs the structured PR representation used to build \\(\mathcal{D}^{\text{ctx}}_{\text{py}}\\).
99
 
100
  | Pipeline | Description | Link |
101
  |----------|---------|-------------|
102
+ | daVinci-Dev Pipeline | a high-performance pipeline used to build \\(\mathcal{D}^{\text{ctx}}_{\text{py}}\\) | [`GAIR-NLP/daVinci-Dev`](https://github.com/GAIR-NLP/daVinci-Dev) |
103
 
104
  ## Quick Start
105
 
 
204
 
205
  ## Citation
206
 
207
+ If you use this work, please cite the daVinci-Dev paper.
208
+
209
+ ```
210
+ @misc{zeng2026davincidevagentnativemidtrainingsoftware,
211
+ title={daVinci-Dev: Agent-native Mid-training for Software Engineering},
212
+ author={Ji Zeng and Dayuan Fu and Tiantian Mi and Yumin Zhuang and Yaxing Huang and Xuefeng Li and Lyumanshan Ye and Muhang Xie and Qishuo Hua and Zhen Huang and Mohan Jiang and Hanning Wang and Jifan Lin and Yang Xiao and Jie Sun and Yunze Wu and Pengfei Liu},
213
+ year={2026},
214
+ eprint={2601.18418},
215
+ archivePrefix={arXiv},
216
+ primaryClass={cs.SE},
217
+ url={https://arxiv.org/abs/2601.18418},
218
+ }
219
+ ```