QuantTrio
/

MiniMax-M2-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

JunHowie commited on 6 days ago

Commit

3f6771a

·

verified ·

1 Parent(s): 65c2553

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -205,7 +205,7 @@ otherwise the expert tensors wouldn’t be evenly sharded across GPU devices.</i
 ```
 CONTEXT_LENGTH=32768
 vllm serve \
-    tclf90/MiniMax-M2-AWQ \
     --served-model-name MY_MODEL \
     --enable-auto-tool-choice \
     --tool-call-parser minimax_m2 \

 ```
 CONTEXT_LENGTH=32768
 vllm serve \
+    QuantTrio/MiniMax-M2-AWQ \
     --served-model-name MY_MODEL \
     --enable-auto-tool-choice \
     --tool-call-parser minimax_m2 \