---
pipeline_tag: text-generation
base_model:
- Qwen/Qwen3-Next-80B-A3B-Thinking
---
This is a MXFP4 quant of [Qwen3-Next-80B-A3B-Thinking](https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Thinking)

Download the latest [llama.cpp](https://github.com/ggml-org/llama.cpp/releases) in order to use it.

The context has been extended from 256k to 1M, with YaRN as seen on [the repo](https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF#processing-ultra-long-texts)

To enable it, run llama.cpp with options like:  
`--ctx-size 0 --rope-scaling yarn --rope-scale 4`  
ctx-size 0 sets it to 1M context, else set a smaller number like 524288 for 512k

You can use also as normal if you don't want the extended context.