File size: 3,349 Bytes
08b5406 2f756b9 08b5406 2f756b9 08b5406 2f756b9 08b5406 2f756b9 08b5406 2f756b9 08b5406 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
license: apache-2.0
pipeline_tag: text-generation
---
# ROME-30B-A3B (Coming Soon)
<a href="https://arxiv.org/pdf/2512.24873" target="_blank">
🔗 <strong>Technical Report</strong><br/>
<img alt="Paper" src="https://img.shields.io/badge/Paper-arXiv%3A2512.24873-red"/>
</a>
---
## 📢 Note: Coming Soon!
**ROME (ROME is Obviously an Agentic ModEl)** will be officially released soon.
The project is currently under final review and preparation. Model weights will be made publicly available shortly. Stay tuned!
<img src="https://rlhf.oss-cn-hangzhou.aliyuncs.com/iFLOW-ROME/performance.png" width="600"/>
---
## Highlights
**ROME** is an open-source **agentic model** incubated within the **ALE (Agentic Learning Ecosystem)**.
Rather than scaling performance purely by increasing parameter count, ROME achieves parameter-scale–crossing agentic performance through full-stack infrastructure and RL algorithmic optimization.
<img src="https://rlhf.oss-cn-hangzhou.aliyuncs.com/iFLOW-ROME/ALE.PNG" width="600"/>
### 🔧 ALE Full-Stack Infrastructure
- **ROLL** – Large-scale reinforcement learning optimization engine
- **ROCK** – Secure sandbox and environment orchestration for agent execution
- **iFlow CLI** – Unified agent framework and developer interface
### 🧠 IPA Policy Optimization Algorithm
- Introduces **Interaction-Perceptive Agentic Policy Optimization (IPA)**
- Performs credit assignment at the level of **Semantic Interaction Chunks**
- Significantly improves **training stability** and **success rates** on **long-horizon tasks**
### 🚀 Strong Agentic Performance
- Despite being a **mid-sized model** (30B MoE with 3B active parameters), ROME outperforms same-scale models on standard agent benchmarks:
- **Terminal-Bench 2.0**: 24.72%
- **SWE-bench Verified**: 57.40%
- Performance is competitive with, and in some cases comparable to, models exceeding **100B parameters**
### 🔒 Production-Grade Safety
- Designed for autonomous agent execution in real environments
- Rigorously aligned and red-teamed against risks such as:
- Unauthorized access
- Illegal or unsafe tool invocation
- Built with **deployment-grade safety guarantees** in mind
---
## Performance (Preview)
### Terminal-Based Benchmarks
| **Model** | **Terminal-Bench 2.0** | **SWE-bench Verified** |
| ---------------------------- | ---------------------- | ---------------------- |
| Qwen3-Coder-30B-A3B-Instruct | 13.48% | 46.33% |
| **ROME-30B-A3B** | **24.72%** | **57.40%** |
| GPT-OSS-120B | 21.12% | 43.93% |
| GLM-4.5 Air (106B) | 17.30% | 56.20% |
> See the technical report for full experimental details.
---
## Best Practices
*(Code examples and usage guidelines will be added after the model release.)*
---
## Citation
If you find our work useful, please consider citing:
```bibtex
@article{rome2025ale,
title={Let It Flow: Agentic Crafting on Rock and Roll - Building the ROME Model within an Open Agentic Learning Ecosystem},
author={Wang, Weixun and Xu, XiaoXiao and An, Wanhe and Dai, Fangwen and others},
journal={arXiv preprint arXiv:2512.24873},
year={2025}
} |