--- license: apache-2.0 type: model base_model: openai/gpt-oss-120b language: en tags: - gpt_oss - mixture-of-experts - activation-aware-weight-quantization - awq - w4a16 - large-language-model - reasoning - long-context --- # gpt-oss-120b-awq-w4a16 _A 4-bit AWQ-quantised release of **gpt-oss-120b**_ > **TL;DR** – We convert the original FP16/FP32 checkpoint (≈ 234 GB) of **gpt-oss-120b** into a 4-bit weight-only model with 16-bit activations (**W4A16**). > The resulting 11-shard safetensors bundle is **≈ 33.4 GB**, a **7× size reduction** with negligible quality loss. --- ## 1 Model details | Property | Value | |-------------------------------|-------| | Architecture | Mixture-of-Experts Transformer | | Total parameters | 117 B | | Active parameters / token | 5.1 B | | Layers | 36 | | Experts | 128 (4 routed per token) | | Hidden size / head dim | 2880 / 64 | | Context window (max rope) | 131 072 tokens | | Activation function | SwiGLU | | Norm | RMSNorm (ε = 1e-5) | | Rope scaling | YARN (θ = 150 000) | | Training data cut-off | 2024-06-01 | --- ## 2 Quantisation recipe ### 2.1 Activation-Aware Weight Quantisation (AWQ) AWQ protects the ~1 % most activation-sensitive channels by rescaling them **before** 4-bit rounding, vastly reducing quantisation error compared with vanilla GPTQ. * **Post-training** – no back-prop; only a small calibration set is needed. * **Weight-only** – activations stay at fp16/bf16. * **Hardware-friendly** – single-kernel dequant, SIMD-aware packing, no mixed precision. ### 2.2 Layer precision map | Module | Precision | |------------------------------------------|-----------| | All dense & attention weights | **int4** (AWQ) | | LayerNorm, rotary embeddings, router MLP | fp16 | | lm_head | fp16 | ### 2.3 Size breakdown | Shard | Size (GB) | Shard | Size (GB) | |-------|----------:|-------|----------:| | 1 | 1.21 | 7 | 2.18 | | 2 | 4.25 | 8 | 4.25 | | 3 | 2.18 | 9 | 2.18 | | 4 | 4.25 | 10 | 4.25 | | 5 | 2.18 | 11 | 2.18 | | 6 | 4.25 | **Total** | **33.36 GB** | Compression vs original FP16 checkpoint: ```text 234 GB / 33.36 GB ≈ 7× smaller