--- license: apache-2.0 language: - en base_model: - LLM360/K2-V2 --- # **K2-V2-Instruct** K2-V2 model logo ๐Ÿ“š [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - ๐Ÿ“ [Code](https://github.com/llm360/k2v2_train) - ๐Ÿข [Project Page](https://huggingface.co/LLM360/K2-V2) K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family. K2-V2 SFT results Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows. K2-V2 GPQA results --- ## **Quick Start** ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto") tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2") prompt = "Explain why the derivative of sin(x) is cos(x)." messages = [ {"role": "system", "content": "You are K2, a helpful assistant created by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Institute of Foundation Models (IFM)."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## **Evaluation Summary** | Model Specifications | LongBench V2 | AIME25 | HMMT25 | GSM8K | Minerva | GPQA-D | MBPP | HumanEval | LCBv6 | |----------------------|--------------|--------|--------|-------|---------|--------|-------|------------|--------| | **K2 Low**
Dense ยท 70B | 40.7 | 27.3 | 19.0 | 92.4 | 85.0 | 48.5 | 71.0 | 82.3 | 39.9 | | **K2 Medium**
Dense ยท 70B | 41.3 | 62.0 | 45.6 | 92.0 | 90.6 | 60.6 | 75.8 | 84.2 | 51.3 | | **K2 High**
Dense ยท 70B | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 | Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results. --- ## **Datasets & Mixtures** ### **SFT Mix** * **TxT360-3efforts**: curated instruction + mixed-difficulty reasoning traces * Tool-calling demonstrations * Small but high-value corpus to showcase model potential All mixtures, filtering rules, and data sources are fully released for reproducibility. Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information. --- ## **Model Description** - **Model type:** K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm. - **Training stage:** Pre-training & Post-training - **Language(s) (NLP):** English - **License:** Apache 2.0 | Model Hyperparameter | Value | | ----------- | ----------- | | Total Parameters | 70B | | Hidden Size | 8,192 | | Intermediate Size (FFN) | 28,672 | | Number of Attention Heads | 64 | | Number of Layers | 80 | | RMSNorm ษ› | 1e-5 | | Pre-training Seq Length | 8,192 | | Post-training Seq Length | 524,288 | | Vocab Size | 250,000 | --- ## Citation If you use K2-V2-Instruct in your research, please cite the following: ``` @misc{llm360_k2v2_2025, title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model}, author = {K2 Team}, year = {2025}, archivePrefix = {arXiv}, eprint = {XXXX.XXXXX}, primaryClass = {cs.CL} } ```