---
license: apache-2.0
type: model
base_model: openai/gpt-oss-120b
language: en
tags:
  - gpt_oss
  - mixture-of-experts
  - activation-aware-weight-quantization
  - awq
  - w4a16
  - large-language-model
  - reasoning
  - long-context
---

# gpt-oss-120b-awq-w4a16  
_A 4-bit AWQ-quantised release of **gpt-oss-120b**_

> **TL;DR** – We convert the original FP16/FP32 checkpoint (≈ 234 GB) of **gpt-oss-120b** into a 4-bit weight-only model with 16-bit activations (**W4A16**).  
> The resulting 11-shard safetensors bundle is **≈ 33.4 GB**, a **7× size reduction** with negligible quality loss.

---

## 1  Model details

| Property                      | Value |
|-------------------------------|-------|
| Architecture                  | Mixture-of-Experts Transformer |
| Total parameters              | 117 B |
| Active parameters / token     | 5.1 B |
| Layers                        | 36 |
| Experts                       | 128 (4 routed per token) |
| Hidden size / head dim        | 2880 / 64 |
| Context window (max rope)     | 131 072 tokens |
| Activation function           | SwiGLU |
| Norm                          | RMSNorm (ε = 1e-5) |
| Rope scaling                  | YARN (θ = 150 000) |
| Training data cut-off         | 2024-06-01 |

---

## 2  Quantisation recipe

### 2.1  Activation-Aware Weight Quantisation (AWQ)

AWQ protects the ~1 % most activation-sensitive channels by rescaling them **before** 4-bit rounding, vastly reducing quantisation error compared with vanilla GPTQ.

* **Post-training** – no back-prop; only a small calibration set is needed.  
* **Weight-only** – activations stay at fp16/bf16.  
* **Hardware-friendly** – single-kernel dequant, SIMD-aware packing, no mixed precision.

### 2.2  Layer precision map

| Module                                   | Precision |
|------------------------------------------|-----------|
| All dense & attention weights            | **int4** (AWQ) |
| LayerNorm, rotary embeddings, router MLP | fp16 |
| lm_head                                  | fp16 |

### 2.3  Size breakdown

| Shard | Size (GB) | Shard | Size (GB) |
|-------|----------:|-------|----------:|
| 1 | 1.21 | 7 | 2.18 |
| 2 | 4.25 | 8 | 4.25 |
| 3 | 2.18 | 9 | 2.18 |
| 4 | 4.25 | 10 | 4.25 |
| 5 | 2.18 | 11 | 2.18 |
| 6 | 4.25 | **Total** | **33.36 GB** |

Compression vs original FP16 checkpoint:

```text
234 GB  / 33.36 GB  ≈ 7× smaller