--- pipeline_tag: text-generation base_model: - Qwen/Qwen3-Next-80B-A3B-Thinking --- This is a MXFP4 quant of [Qwen3-Next-80B-A3B-Thinking](https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Thinking) Download the latest [llama.cpp](https://github.com/ggml-org/llama.cpp/releases) in order to use it. The context has been extended from 256k to 1M, with YaRN as seen on [the repo](https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF#processing-ultra-long-texts) To enable it, run llama.cpp with options like: `--ctx-size 0 --rope-scaling yarn --rope-scale 4` ctx-size 0 sets it to 1M context, else set a smaller number like 524288 for 512k You can use also as normal if you don't want the extended context.