|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- gguf |
|
|
- qwen |
|
|
- qwen3 |
|
|
- qwen3-coder |
|
|
- qwen3-coder-30B |
|
|
- qwen3-coder-30B-gguf |
|
|
- llama.cpp |
|
|
- quantized |
|
|
- text-generation |
|
|
- reasoning |
|
|
- agent |
|
|
- multilingual |
|
|
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct |
|
|
author: geoffmunn |
|
|
pipeline_tag: text-generation |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
- es |
|
|
- fr |
|
|
- de |
|
|
- ru |
|
|
- ar |
|
|
- ja |
|
|
- ko |
|
|
- hi |
|
|
--- |
|
|
|
|
|
# Qwen3-Coder-30B-A3B-Instruct-f16-GGUF |
|
|
|
|
|
This is a **GGUF-quantized version** of the **[Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)** language model. |
|
|
|
|
|
Converted for use with `llama.cpp`, [LM Studio](https://lmstudio.ai), [OpenWebUI](https://openwebui.com), [GPT4All](https://gpt4all.io), and more. |
|
|
|
|
|
π‘ **Key Features of Qwen3-Coder-30B-A3B-Instruct:** |
|
|
|
|
|
## Available Quantizations (from f16) |
|
|
|
|
|
| Level | Quality | Speed | Size | Recommendation | |
|
|
|----------|--------------|----------|-----------|----------------| |
|
|
| Q2_K | Minimal | β‘ Fast | 11.30 GB | Only on severely memory-constrained systems. | |
|
|
| Q3_K_S | Low-Medium | β‘ Fast | 13.30 GB | Minimal viability; avoid unless space-limited. | |
|
|
| Q3_K_M | Low-Medium | β‘ Fast | 14.70 GB | Acceptable for basic interaction. | |
|
|
| Q4_K_S | Practical | β‘ Fast | 17.50 GB | Good balance for mobile/embedded platforms. | |
|
|
| Q4_K_M | Practical | β‘ Fast | 18.60 GB | Best overall choice for most users. | |
|
|
| Q5_K_S | Max Reasoning | π’ Medium | 21.10 GB | Slight quality gain; good for testing. | |
|
|
| Q5_K_M | Max Reasoning | π’ Medium | 21.70 GB | Best quality available. Recommended. | |
|
|
| Q6_K | Near-FP16 | π Slow | 25.10 GB | Diminishing returns. Only if RAM allows. | |
|
|
| Q8_0 | Lossless* | π Slow | 32.50 GB | Maximum fidelity. Ideal for archival. | |
|
|
|
|
|
> π‘ **Recommendations by Use Case** |
|
|
> |
|
|
> - π» **Standard Laptop (i5/M1 Mac)**: Q5_K_M (optimal quality) |
|
|
> - π§ **Reasoning, Coding, Math**: Q5_K_M or Q6_K |
|
|
> - π **RAG, Retrieval, Precision Tasks**: Q6_K or Q8_0 |
|
|
> - π€ **Agent & Tool Integration**: Q5_K_M |
|
|
> - π οΈ **Development & Testing**: Test from Q4_K_M up to Q8_0 |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
Load this model using: |
|
|
- [OpenWebUI](https://openwebui.com) β self-hosted AI interface with RAG & tools |
|
|
- [LM Studio](https://lmstudio.ai) β desktop app with GPU support |
|
|
- [GPT4All](https://gpt4all.io) β private, offline AI chatbot |
|
|
- Or directly via `llama.cpp` |
|
|
|
|
|
Each quantized model includes its own `README.md` and shares a common `MODELFILE`. |
|
|
|
|
|
## Author |
|
|
|
|
|
π€ Geoff Munn (@geoffmunn) |
|
|
π [Hugging Face Profile](https://huggingface.co/geoffmunn) |
|
|
|
|
|
## Disclaimer |
|
|
|
|
|
This is a community conversion for local inference. Not affiliated with Alibaba Cloud or the Qwen team. |
|
|
|