exdysa
/

MiniMax-M2.5-REAP-172B-A10B-GGUF-Q4_K_M

Text Generation

static quantization

4-bit precision

Model card Files Files and versions

exdysa commited on about 14 hours ago

Commit

adaba4b

·

verified ·

1 Parent(s): e2c5b0c

Update README.md

Files changed (1) hide show

README.md +8 -2

README.md CHANGED Viewed

@@ -37,9 +37,15 @@ tags:
 - 4-bit
 ```
-MiniMax-M2.5-REAP-172B-A10B-GGUF-Q4 is a 172 billion parameter MiniMax M2.5 model with 25% of its experts pruned with REAP (Router-weighted Expert Activation Pruning), then converted to GGUF with llama.cpp and static Q4 quantized.
-Patch: Reuploaded quantization from `llama.cpp` main@8110 `gguf` @0.17.1 On initial push testing on M4 device and Ollama the model rambled compared to M2.1-REAP. Source used to convert,`llama.cpp` main@7952 quantization.
 Command sequence using source version of llama.cpp from source and `ports` llama-quantize:

 - 4-bit
 ```
+# MiniMax-M2.5-REAP-172B-A10B-GGUF-Q4
+This is a 172 billion parameter MiniMax M2.5 model with 25% of its experts pruned with REAP (Router-weighted Expert Activation Pruning), then converted to GGUF with llama.cpp and static Q4 quantized.
+> [!NOTE]
+> ## Patched 20 / 02 /26
+> Reuploaded quantization from `llama.cpp` main@8110 `gguf` @0.17.1.
+> On initial push testing on M4 device and Ollama the model rambled compared to M2.1-REAP.
+> Original conversion,`llama.cpp` main@7952 quantization.
 Command sequence using source version of llama.cpp from source and `ports` llama-quantize: