Rename Qwen3-Coder-30B-A3B-Instruct-Q5_K_M/README.md to Qwen3-Coder-30B-A3B-Instruct-f16-Q5_K_M/README.md

26abb30 verified about 1 month ago

3.13 kB

metadata

license: apache-2.0
tags:
  - gguf
  - qwen
  - qwen3-coder
  - qwen3-coder-30b-q5
  - qwen3-coder-30b-q5_k_m
  - qwen3-coder-30b-q5_k_m-gguf
  - llama.cpp
  - quantized
  - text-generation
  - chat
  - reasoning
  - agent
  - multilingual
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
author: geoffmunn

Qwen3-Coder-30B-A3B-Instruct-f16:Q5_K_M

Quantized version of Qwen/Qwen3-Coder-30B-A3B-Instruct at Q5_K_M level, derived from f16 base weights.

Model Info

Format: GGUF (for llama.cpp and compatible runtimes)
Size: 21.70 GB
Precision: Q5_K_M
Base Model: Qwen/Qwen3-Coder-30B-A3B-Instruct
Conversion Tool: llama.cpp

Quality & Performance

Metric	Value
Quality	Max Reasoning
Speed	🐢 Medium
RAM Required	~33.2 GB
Recommendation	Highest practical quality. Choose this if you need better logic.

Prompt Template (ChatML)

This model uses the ChatML format used by Qwen:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Set this in your app (LM Studio, OpenWebUI, etc.) for best results.

Generation Parameters

Recommended defaults:

Parameter	Value
Temperature	0.6
Top-P	0.95
Top-K	20
Min-P	0.0
Repeat Penalty	1.1

Stop sequences: <|im_end|>, <|im_start|>

🖥️ CLI Example Using Ollama or TGI Server

Here’s how you can query this model via API using curl and jq. Replace the endpoint with your local server.

curl http://localhost:11434/api/generate -s -N -d '{
  "model": "hf.co/geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f16:Q5_K_M",
  "prompt": "Respond exactly as follows: Explain how photosynthesis converts sunlight into chemical energy in plants.",
  "temperature": 0.5,
  "top_p": 0.95,
  "top_k": 20,
  "min_p": 0.0,
  "repeat_penalty": 1.1,
  "stream": false
}' | jq -r '.response'

🎯 Why this works well:

The prompt is meaningful and achievable for this model size.
Temperature tuned appropriately: lower for factual (0.5), higher for creative (0.7).
Uses jq to extract clean output.

Verification

Check integrity:

sha256sum -c ../SHA256SUMS.txt

Usage

Compatible with:

LM Studio – local AI model runner
OpenWebUI – self-hosted AI interface
GPT4All – private, offline AI chatbot
Directly via llama.cpp

License

Apache 2.0 – see base model for full terms.