File size: 5,560 Bytes
a40e236
 
 
 
 
 
 
 
 
 
 
 
 
d66a05c
a40e236
 
2d66300
a40e236
 
 
2d66300
ad0e176
66aa882
 
56eaa8d
 
91435b6
 
56eaa8d
 
 
2d215eb
 
 
 
 
 
 
 
 
 
397ab70
 
 
555c32f
2d215eb
7290cbf
 
2d215eb
91435b6
 
 
 
 
6a1c339
4d2aac3
6a1c339
4d2aac3
 
 
6a1c339
 
 
 
 
 
ed49ed9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d66a05c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
license: apache-2.0
language:
- en
- ko
base_model:
- Motif-Technologies/Motif-2-12.7B-Base
tags:
- text-generation-inference
- conversational
- custom_code
- text-generation
- Motif
library_name: transformers
---

Last update: 12 Nov. 2025

# Introduction

We are pleased to announce **Motif-2-12.7B-Instruct**, a 12.7-billion-parameter language model. This model is an <U>**supervised fine-tuning (SFT)** variant</U> of our base model: https://huggingface.co/Motif-Technologies/Motif-2-12.7B-Base. Detailed information is found in the technical report: [https://arxiv.org/abs/2511.07464](https://arxiv.org/abs/2511.07464).

One can chat directly with Motif-2-12.7B-Instruct at [https://chat.motiftech.io](https://chat.motiftech.io/).

# Evaluation

*The results of Qwen3 and Gemma 3 are <U>sourced directly from their technical reports.</U>*

|Benchmark|Evaluation setting|Motif-2-12.7B|Qwen2.5-72B|Qwen3-14B|Qwen3-14B|Qwen3-32B|Qwen3-32B|Qwen3-30B-A3B|Qwen3-30B-A3B|Gemma-3-12B|Gemma-3-27B|
|---|---|---|---|---|---|---|---|---|---|---|---|
|||Instruct|Instruct|Non-thinking|Thinking|Non-thinking|Thinking|Non-thinking|Thinking|Instruct|Instruct|
|MMLU|0-shot|86.11|-|-|-|-|-|-|-|71.9|76.9|
|MMLU-Redux|-|90.02|86.8|82|88.6|85.7|90.9|84.1|89.5|-|-|
|BBH|0-shot|85.78|-|-|-|-|-|-|-|85.7|87.6|
|GPQA-Diamond|0-shot, CoT|63.6|49|54.8|64|54.6|68.4|54.8|65.8|40.9|42.4|
|GSM8K|0-shot, CoT|96.13|-|-|-|-|-|-|-|94.4|95.9|
|MATH|0-shot|97|-|-|-|-|-|-|-|83.8|89|
|MBPP|3-shot|91|-|-|-|-|-|-|-|73|74.4|
|LiveBench 2024-11-25|-|33.8|51.4|59.6|71.3|59.8|74.9|59.4|74.3|-|-|
|IFEval|strict prompt|75.78|84.1|84.8|85.4|83.2|85|83.7|86.5|-|-|
|IFEval|0-shot|76.52|-|-|-|-|-|-|-|88.9|90.4|
|MATH-500|-|96.8|83.6|90|96.8|88.6|97.2|89.8|98|-|-|
|AIME24|-|72.3|18.9|31.7|79.3|31|81.4|32.8|80.4|-|-|
|AIME25|-|63.6|15|23.3|70.4|20.2|72.9|21.6|70.9|-|-|
|ZebraLogic|-|69.5|26.6|33|88.5|29.2|88.8|33.2|89.5|-|-|
|BFCL v3|-|55.34|63.4|61.5|70.4|63|70.3|58.6|69.1|-|-|
|LiveCodeBench v5 <br> (2024.10 - 2025.2)|-|50.03|30.7|29|63.5|31.3|65.7|29.8|62.6|-|-|
|LiveCodeBench v5 |0-shot, CoT|61.66|-|-|-|-|-|-|-|32|39|
|HumanEval|0-shot|93.2|-|-|-|-|-|-|-|85.4|87.8|

## Averages and improvements of the corresponding benchmark scores:

### v.s. Gemma 3

||Motif-2-12.7B|Gemma-3-12B|Gemma-3-27B|
|---|---|---|---|
||Instruct|Instruct|Instruct|
|Average|83.44|72.89|75.93|
|Improvement||+14.48%|+9.89%|

### v.s. Qwen3

||Motif-2-12.7B|Qwen2.5-72B|Qwen3-14B|Qwen3-14B|Qwen3-32B|Qwen3-32B|Qwen3-30B-A3B|Qwen3-30B-A3B|
|---|---|---|---|---|---|---|---|---|
||Instruct|Instruct|Non-thinking|Thinking|Non-thinking|Thinking|Non-thinking|Thinking|
|Average|67.08|50.95|54.97|77.82|54.66|79.55|54.78|78.66|
|Improvement||+31.65%|+22.02%|-13.80%|+22.72%|-15.68%|+22.45%|-14.73%|

## How to use in transformers
To use this model, install huggingface [kernels](https://github.com/huggingface/kernels).
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Motif-Technologies/Motif-2-12.7B-Instruct",
    trust_remote_code = True,
    _attn_implementation = "flash_attention_2",
    dtype = torch.bfloat16 # currently supports bf16 only, for efficiency
).cuda()

tokenizer = AutoTokenizer.from_pretrained(
    "Motif-Technologies/Motif-2-12.7B-Instruct",
    trust_remote_code = True,
)

query = "What is the capital city of South Korea?"
input_ids = tokenizer.apply_chat_template(
    [
        {'role': 'system', 'content': 'you are an helpful assistant'},
        {'role': 'user', 'content': query},
    ],
    add_generation_prompt = True,
    enable_thinking = False, # or True
    return_tensors='pt',
).cuda()

output = model.generate(input_ids, max_new_tokens=1024, pad_token_id=tokenizer.eos_token_id)
output = tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens = False)
print(output)
```
### outputs
```
# with enable_thinking=True, the model is FORCED to think.
Okay, the user is asking for the capital city of South Korea. Let me think. I know that South Korea's capital is Seoul. But wait, I should double-check to make sure I'm not mixing it up with other countries. For example, North Korea's capital is Pyongyang. So yes, South Korea's capital is definitely Seoul. I should just provide that as the answer.
</think>
The capital city of South Korea is **Seoul**.
<|endofturn|><|endoftext|>

# with enable_thinking=False, the model chooses to think or not. in this example, thinking is not worth it. 
The capital city of South Korea is Seoul.
<|endofturn|><|endoftext|>
```

## How to use in vllm
The [PR](https://github.com/vllm-project/vllm/pull/27396) adding support for the Motif model in the official vLLM package is currently under review.  
In the meantime, to use our model with vLLM, please use the following container [image](https://github.com/motiftechnologies/vllm/pkgs/container/vllm).  
Our model supports a sequence length of up to 32K tokens.
```bash
# run vllm api server
VLLM_ATTENTION_BACKEND="DIFFERENTIAL_FLASH_ATTN" vllm serve Motif-Technologies/Motif-2-12.7B-Instruct --trust-remote-code --data-parallel-size <gpu_count>

# sending requests with curl
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital city of South Korea?"}
    ],
    "temperature": 0.6,
    "skip_special_tokens": false,
    "chat_template_kwargs": {
        "enable_thinking": true
    }
  }'
```