--- license: apache-2.0 tags: - mlx pipeline_tag: text-generation base_model: Kwaipilot/KAT-Dev-72B-Exp library_name: mlx --- # KAT-Dev-72B-Exp-qx86-hi-mlx This is the Deckard(qx) quant formula--a tuned mixed precision design. On regular Qwens it makes the model "happier", and more involved in the interaction, often using metaphors to anchor meaning(see other model cards for examples) I have no idea how well it works on this one, running metrics on this will take a while. If you "like" the model, I'll keep it in the collection. ```bash Model Perplexity Peak memory qx86-hi-mlx 4.333 ± 0.031 71.47 GB q8-hi 4.334 ± 0.031 86.59 GB ``` The mixed precision model is slightly outperforming q8. This is expected -G This model [KAT-Dev-72B-Exp-qx86-hi-mlx](https://huggingface.co/KAT-Dev-72B-Exp-qx86-hi-mlx) was converted to MLX format from [Kwaipilot/KAT-Dev-72B-Exp](https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp) using mlx-lm version **0.28.2**. ## Use with mlx ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("KAT-Dev-72B-Exp-qx86-hi-mlx") prompt = "hello" if tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ```