---
license: apache-2.0
language:
- en
base_model:
- LLM360/K2-V2
---
# **K2-V2-Instruct**
๐ [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - ๐ [Code](https://github.com/llm360/k2v2_train) - ๐ข [Project Page](https://huggingface.co/LLM360/K2-V2)
K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.
Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.
---
## **Quick Start**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")
prompt = "Explain why the derivative of sin(x) is cos(x)."
messages = [
{"role": "system", "content": "You are K2, a helpful assistant created by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Institute of Foundation Models (IFM)."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## **Evaluation Summary**
| Model Specifications | LongBench V2 | AIME25 | HMMT25 | GSM8K | Minerva | GPQA-D | MBPP | HumanEval | LCBv6 |
|----------------------|--------------|--------|--------|-------|---------|--------|-------|------------|--------|
| **K2 Low**
Dense ยท 70B | 40.7 | 27.3 | 19.0 | 92.4 | 85.0 | 48.5 | 71.0 | 82.3 | 39.9 |
| **K2 Medium**
Dense ยท 70B | 41.3 | 62.0 | 45.6 | 92.0 | 90.6 | 60.6 | 75.8 | 84.2 | 51.3 |
| **K2 High**
Dense ยท 70B | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
---
## **Datasets & Mixtures**
### **SFT Mix**
* **TxT360-3efforts**: curated instruction + mixed-difficulty reasoning traces
* Tool-calling demonstrations
* Small but high-value corpus to showcase model potential
All mixtures, filtering rules, and data sources are fully released for reproducibility.
Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information.
---
## **Model Description**
- **Model type:** K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
- **Training stage:** Pre-training & Post-training
- **Language(s) (NLP):** English
- **License:** Apache 2.0
| Model Hyperparameter | Value |
| ----------- | ----------- |
| Total Parameters | 70B |
| Hidden Size | 8,192 |
| Intermediate Size (FFN) | 28,672 |
| Number of Attention Heads | 64 |
| Number of Layers | 80 |
| RMSNorm ษ | 1e-5 |
| Pre-training Seq Length | 8,192 |
| Post-training Seq Length | 524,288 |
| Vocab Size | 250,000 |
---
## Citation
If you use K2-V2-Instruct in your research, please cite the following:
```
@misc{llm360_k2v2_2025,
title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
author = {K2 Team},
year = {2025},
archivePrefix = {arXiv},
eprint = {XXXX.XXXXX},
primaryClass = {cs.CL}
}
```