Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -31,6 +31,8 @@ GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/
|
|
| 31 |
|-------------|------|-------------|
|
| 32 |
| Q8_0 | 227 GB | 8-bit quantization, highest quality |
|
| 33 |
| Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size |
|
|
|
|
|
|
|
| 34 |
|
| 35 |
## Usage
|
| 36 |
|
|
@@ -44,5 +46,5 @@ llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128
|
|
| 44 |
## Notes
|
| 45 |
|
| 46 |
- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
|
| 47 |
-
- This is a large MoE model. Even
|
| 48 |
- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.
|
|
|
|
| 31 |
|-------------|------|-------------|
|
| 32 |
| Q8_0 | 227 GB | 8-bit quantization, highest quality |
|
| 33 |
| Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size |
|
| 34 |
+
| IQ3_S | 92 GB | 3-bit importance quantization (small), compact |
|
| 35 |
+
| Q2_K | 78 GB | 2-bit K-quant, smallest size |
|
| 36 |
|
| 37 |
## Usage
|
| 38 |
|
|
|
|
| 46 |
## Notes
|
| 47 |
|
| 48 |
- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
|
| 49 |
+
- This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts.
|
| 50 |
- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.
|