ahoybrotherbear commited on
Commit
0520cd9
·
verified ·
1 Parent(s): 5689356

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: MiniMaxAI/MiniMax-M2.5
3
+ library_name: mlx
4
+ tags:
5
+ - mlx
6
+ - quantized
7
+ - 3bit
8
+ - minimax_m2
9
+ - text-generation
10
+ - conversational
11
+ - apple-silicon
12
+ license: other
13
+ license_name: modified-mit
14
+ license_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.5/blob/main/LICENSE
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # MiniMax-M2.5 3-bit MLX
19
+
20
+ This is a 3-bit quantized [MLX](https://github.com/ml-explore/mlx) version of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5), converted using [mlx-lm](https://github.com/ml-explore/mlx-lm) v0.30.7.
21
+
22
+ MiniMax-M2.5 is a 229B parameter Mixture of Experts model (10B active parameters) that achieves 80.2% on SWE-Bench Verified and is SOTA in coding, agentic tool use, and search tasks.
23
+
24
+ ## Important: Quality Note
25
+
26
+ **This is an aggressive quantization.** Independent testing by [inferencerlabs](https://huggingface.co/inferencerlabs/MiniMax-M2.5-MLX-9bit) shows significant quality degradation below 4 bits for this model (q3.5 scored 43% token accuracy vs 91%+ at q4.5). This 3-bit quant was manually tested on coding and reasoning tasks and produced coherent output, but expect noticeable quality loss compared to 4-bit and above.
27
+
28
+ **If you have 256GB+ of RAM, use the [4-bit quant](https://huggingface.co/mlx-community/MiniMax-M2.5-4bit) instead.** This 3-bit version is primarily useful for machines with 192GB of unified memory where the 4-bit version won't fit.
29
+
30
+ ## Requirements
31
+
32
+ - Apple Silicon Mac (M2 Ultra or later)
33
+ - At least 192GB of unified memory
34
+
35
+ ## Quick Start
36
+
37
+ Install mlx-lm:
38
+
39
+ ```
40
+ pip install -U mlx-lm
41
+ ```
42
+
43
+ ### CLI
44
+
45
+ ```bash
46
+ mlx_lm.generate \
47
+ --model ahoybrotherbear/MiniMax-M2.5-3bit-MLX \
48
+ --prompt "Hello, how are you?" \
49
+ --max-tokens 256 \
50
+ --temp 0.7
51
+ ```
52
+
53
+ ### Python
54
+
55
+ ```python
56
+ from mlx_lm import load, generate
57
+
58
+ model, tokenizer = load("ahoybrotherbear/MiniMax-M2.5-3bit-MLX")
59
+
60
+ messages = [{"role": "user", "content": "Hello, how are you?"}]
61
+ prompt = tokenizer.apply_chat_template(
62
+ messages, tokenize=False, add_generation_prompt=True
63
+ )
64
+
65
+ response = generate(
66
+ model, tokenizer,
67
+ prompt=prompt,
68
+ max_tokens=256,
69
+ temp=0.7,
70
+ verbose=True
71
+ )
72
+ print(response)
73
+ ```
74
+
75
+ ## Conversion Details
76
+
77
+ - **Source model**: [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) (FP8)
78
+ - **Converted with**: mlx-lm v0.30.7
79
+ - **Quantization**: 3-bit (3.501 average bits per weight)
80
+ - **Original parameters**: 229B total / 10B active (MoE)
81
+ - **Peak memory during inference**: ~100GB
82
+ - **Generation speed**: ~54 tokens/sec on M3 Ultra
83
+
84
+ ## Original Model
85
+
86
+ MiniMax-M2.5 was created by [MiniMaxAI](https://huggingface.co/MiniMaxAI). See the [original model card](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) for full details on capabilities, benchmarks, and license terms.