JoyAI-LLM-Flash-mxfp8-mlx

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.538,0.708,0.851,0.699,0.422,0.773,0.639
q8       0.519,0.698,0.866,0.704,0.418,0.780,0.647
qx86-hi  0.527,0.701,0.866,0.703,0.424,0.776,0.670
qx64-hi  0.509,0.696,0.871,0.698,0.426,0.771,0.656
mxfp4    0.526,0.696,0.840,0.691,0.416,0.774,0.633

Perplexity
mxfp8    11.786 ± 0.124
qx86-hi  10.213 ± 0.104
qx64-hi  10.218 ± 0.104
mxfp4    12.086 ± 0.125

The mxfp8 is so far the best quant in terms of arc performance, qx quants match or surpass q8 in key metrics.

The Deckard(qx) tested used the Qwen formula, which might needs some tuning for this model, and could get even better.

To compare performance, here are a few recent Nightmedia 30B merges

Qwen3-30B-A3B-YOYO-V2-Gemini250-Instruct
qx86-hi  0.586,0.763,0.886,0.746,0.472,0.810,0.700

Qwen3-30B-A3B-YOYO-V4-Gemini250-Instruct
qx86-hi  0.575,0.759,0.884,0.731,0.470,0.802,0.686

Qwen3-30B-A3B-YOYO-V2-R16-Claude250x
qx86-hi  0.545,0.717,0.877,0.717,0.440,0.789,0.653

Qwen3-30B-A3B-Thinking-YOYO-V2-GLM4.7-A1
qx86-hi  0.583,0.761,0.883,0.734,0.466,0.806,0.683

Qwen3-30B-A3B-Architect18
qx86-hi  0.577,0.760,0.879,0.760,0.446,0.803,0.702

Qwen3-30B-A3B-Element9b
qx86-hi  0.583,0.758,0.880,0.752,0.466,0.801,0.695

Qwen3-30B-A3B-Element11b
qx86-hi  0.586,0.757,0.880,0.753,0.458,0.805,0.705

The Element9b and Element11b were merged using Architect18 and the first three models

-G

This model JoyAI-LLM-Flash-mxfp8-mlx was converted to MLX format from jdopensource/JoyAI-LLM-Flash using mlx-lm version 0.30.8.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("JoyAI-LLM-Flash-mxfp8-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
89
Safetensors
Model size
49B params
Tensor type
U8
·
U32
·
BF16
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/JoyAI-LLM-Flash-mxfp8-mlx

Quantized
(15)
this model