marksverdhei
/

MiniMax-M2.5-GGUF

+---
+license: other
+base_model: MiniMaxAI/MiniMax-M2.5
+tags:
+  - gguf
+  - llama.cpp
+  - quantized
+  - moe
+---
+# MiniMax-M2.5 GGUF
+GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5), created with [llama.cpp](https://github.com/ggerganov/llama.cpp).
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Base model** | MiniMaxAI/MiniMax-M2.5 |
+| **Architecture** | Mixture of Experts (MoE) |
+| **Total parameters** | 230B |
+| **Active parameters** | 10B per token |
+| **Layers** | 62 |
+| **Total experts** | 256 |
+| **Active experts per token** | 8 |
+| **Source precision** | FP8 (`float8_e4m3fn`) |
+## Available Quantizations
+| Quantization | Size | Description |
+|-------------|------|-------------|
+| Q8_0 | ~227 GB | 8-bit quantization, highest quality |
+| Q4_K_M | — | 4-bit K-quant (medium), good balance of quality and size |
+| IQ3_S | — | 3-bit importance quantization (small), compact |
+| Q2_K | — | 2-bit K-quant, smallest size |
+## Usage
+These GGUFs can be used with [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible frontends.
+```bash
+# Example with llama-cli
+llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128
+```
+## Notes
+- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
+- This is a large MoE model. Even the smallest quant (Q2_K) requires significant memory due to the number of experts.
+- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.