Update README.md
Browse files
README.md
CHANGED
|
@@ -37,9 +37,15 @@ tags:
|
|
| 37 |
- 4-bit
|
| 38 |
```
|
| 39 |
|
| 40 |
-
MiniMax-M2.5-REAP-172B-A10B-GGUF-Q4
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
Command sequence using source version of llama.cpp from source and `ports` llama-quantize:
|
| 45 |
|
|
|
|
| 37 |
- 4-bit
|
| 38 |
```
|
| 39 |
|
| 40 |
+
# MiniMax-M2.5-REAP-172B-A10B-GGUF-Q4
|
| 41 |
|
| 42 |
+
This is a 172 billion parameter MiniMax M2.5 model with 25% of its experts pruned with REAP (Router-weighted Expert Activation Pruning), then converted to GGUF with llama.cpp and static Q4 quantized.
|
| 43 |
+
|
| 44 |
+
> [!NOTE]
|
| 45 |
+
> ## Patched 20 / 02 /26
|
| 46 |
+
> Reuploaded quantization from `llama.cpp` main@8110 `gguf` @0.17.1.
|
| 47 |
+
> On initial push testing on M4 device and Ollama the model rambled compared to M2.1-REAP.
|
| 48 |
+
> Original conversion,`llama.cpp` main@7952 quantization.
|
| 49 |
|
| 50 |
Command sequence using source version of llama.cpp from source and `ports` llama-quantize:
|
| 51 |
|