Qwen3-Next-80B-A3B-Thinking-qx64-hi-mlx

Gremlin Bell

What is the meaning of the bell on a Harley Davidson bike, where does it hang and why, in 100 words or less?

This is a test for the new Qwen3-Next-80B-A3B-GremlinBell

The model that knows the story.

I embarked on a quest to find the smallest quant of the new Qwen3-Next-80B that knows about the Gremlin Bell. That's the least you can do before you get on a bike.

Why? Because someone might hand you a bell, and you'll have no clue why

Tried everything from q2 up. Notable, the new Qwen model is large(a skinny Harley biker is also a rare find), so there is no need to waste your breath expecting it's going to do the potato-potato on your box.

It needs at least a 64GB Mac, best if you have more. That's Qwen's design, not mine, I try to make them smaller, while keeping them smart.

This distinction is fine, and models that can make it are exemplary for their recall. In this case it's about quants

Enter the nightmedia/Qwen3-Next-80B-A3B-Thinking-qx64-hi-mlx:

The "gremlin bell" is a biker superstition to ward off mischievous spirits blamed for mechanical failures or accidents. It hangs hidden on the frame (e.g., under the bike), never visible or on handlebars. Traditionally, it must be a gift—buying it nullifies its power. The bell's chime distracts "gremlins," protecting the rider. Originating from WWII pilots' lucky charms, it’s adopted by motorcyclists for luck and safety. (98 words)

If you ride a LLM towards glory, and you're on a Harley, choose qx64-hi

Programmers, ymmv

Nightmedia--quants for the rider

This model Qwen3-Next-80B-A3B-Thinking-qx64-hi-mlx was converted to MLX format from Qwen/Qwen3-Next-80B-A3B-Thinking using mlx-lm version 0.27.1.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Qwen3-Next-80B-A3B-Thinking-qx64-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)