K2-V2-Instruct / README.md
desaifan-mbzuai's picture
Update README.md
e35a4f2 verified
|
raw
history blame
4.21 kB
metadata
license: apache-2.0
language:
  - en
base_model:
  - LLM360/K2-V2

K2-V2-Instruct

K2-V2 model logo

馃摎 Tech Report - 馃摑 Code - 馃彚 Project Page

K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.

K2-V2 SFT results

Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.

K2-V2 GPQA results

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")

prompt = "Explain why the derivative of sin(x) is cos(x)."
messages = [
    {"role": "system", "content": "You are K2, a helpful assistant created by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Institute of Foundation Models (IFM)."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation Summary

Model Specifications LongBench V2 AIME25 HMMT25 GSM8K Minerva GPQA-D MBPP HumanEval LCBv6
K2 Low
Dense 路 70B
40.7 27.3 19.0 92.4 85.0 48.5 71.0 82.3 39.9
K2 Medium
Dense 路 70B
41.3 62.0 45.6 92.0 90.6 60.6 75.8 84.2 51.3
K2 High
Dense 路 70B
42.6 80.2 71.4 94.8 94.5 69.3 84.8 91.5 67.0

Please refer to our Tech Report for detailed evaluation results.


Datasets & Mixtures

SFT Mix

  • TxT360-3efforts: curated instruction + mixed-difficulty reasoning traces
  • Tool-calling demonstrations
  • Small but high-value corpus to showcase model potential

All mixtures, filtering rules, and data sources are fully released for reproducibility.

Please refer to our Tech Report for detailed datasets and mixtures information.


Model Description

  • Model type: K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
  • Training stage: Pre-training & Post-training
  • Language(s) (NLP): English
  • License: Apache 2.0
Model Hyperparameter Value
Total Parameters 70B
Hidden Size 8,192
Intermediate Size (FFN) 28,672
Number of Attention Heads 64
Number of Layers 80
RMSNorm 蓻 1e-5
Pre-training Seq Length 8,192
Post-training Seq Length 524,288
Vocab Size 250,000

Citation

If you use K2-V2-Instruct in your research, please cite the following:

@misc{llm360_k2v2_2025,
  title         = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
  author        = {K2 Team},
  year          = {2025},
  archivePrefix = {arXiv},
  eprint        = {XXXX.XXXXX},
  primaryClass  = {cs.CL}
}