|
|
--- |
|
|
base_model: MiniMaxAI/MiniMax-M2.5 |
|
|
library_name: mlx |
|
|
tags: |
|
|
- mlx |
|
|
- quantized |
|
|
- 3bit |
|
|
- minimax_m2 |
|
|
- text-generation |
|
|
- conversational |
|
|
- apple-silicon |
|
|
license: other |
|
|
license_name: modified-mit |
|
|
license_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.5/blob/main/LICENSE |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# MiniMax-M2.5 3-bit MLX |
|
|
|
|
|
**⚠️ UPLOAD IN PROGRESS -- model files still uploading, not yet ready for use.** |
|
|
|
|
|
This is a 3-bit quantized [MLX](https://github.com/ml-explore/mlx) version of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5), converted using [mlx-lm](https://github.com/ml-explore/mlx-lm) v0.30.7. |
|
|
|
|
|
MiniMax-M2.5 is a 229B parameter Mixture of Experts model (10B active parameters) that achieves 80.2% on SWE-Bench Verified and is SOTA in coding, agentic tool use, and search tasks. |
|
|
|
|
|
## Important: Quality Note |
|
|
|
|
|
**This is an aggressive quantization.** Independent testing by [inferencerlabs](https://huggingface.co/inferencerlabs/MiniMax-M2.5-MLX-9bit) shows significant quality degradation below 4 bits for this model (q3.5 scored 43% token accuracy vs 91%+ at q4.5). This 3-bit quant was manually tested on coding and reasoning tasks and produced coherent output, but expect noticeable quality loss compared to 4-bit and above. |
|
|
|
|
|
**If you have 256GB+ of RAM, use the [4-bit quant](https://huggingface.co/mlx-community/MiniMax-M2.5-4bit) instead.** This 3-bit version is primarily useful for machines with 192GB of unified memory where the 4-bit version won't fit. |
|
|
|
|
|
## Requirements |
|
|
|
|
|
- Apple Silicon Mac (M2 Ultra or later) |
|
|
- At least 192GB of unified memory |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
Install mlx-lm: |
|
|
|
|
|
``` |
|
|
pip install -U mlx-lm |
|
|
``` |
|
|
|
|
|
### CLI |
|
|
|
|
|
```bash |
|
|
mlx_lm.generate \ |
|
|
--model ahoybrotherbear/MiniMax-M2.5-3bit-MLX \ |
|
|
--prompt "Hello, how are you?" \ |
|
|
--max-tokens 256 \ |
|
|
--temp 0.7 |
|
|
``` |
|
|
|
|
|
### Python |
|
|
|
|
|
```python |
|
|
from mlx_lm import load, generate |
|
|
|
|
|
model, tokenizer = load("ahoybrotherbear/MiniMax-M2.5-3bit-MLX") |
|
|
|
|
|
messages = [{"role": "user", "content": "Hello, how are you?"}] |
|
|
prompt = tokenizer.apply_chat_template( |
|
|
messages, tokenize=False, add_generation_prompt=True |
|
|
) |
|
|
|
|
|
response = generate( |
|
|
model, tokenizer, |
|
|
prompt=prompt, |
|
|
max_tokens=256, |
|
|
temp=0.7, |
|
|
verbose=True |
|
|
) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
## Conversion Details |
|
|
|
|
|
- **Source model**: [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) (FP8) |
|
|
- **Converted with**: mlx-lm v0.30.7 |
|
|
- **Quantization**: 3-bit (3.501 average bits per weight) |
|
|
- **Original parameters**: 229B total / 10B active (MoE) |
|
|
- **Peak memory during inference**: ~100GB |
|
|
- **Generation speed**: ~54 tokens/sec on M3 Ultra |
|
|
|
|
|
## Original Model |
|
|
|
|
|
MiniMax-M2.5 was created by [MiniMaxAI](https://huggingface.co/MiniMaxAI). See the [original model card](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) for full details on capabilities, benchmarks, and license terms. |
|
|
|