geoffmunn's picture
Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload
f79843f verified
metadata
license: apache-2.0
tags:
  - gguf
  - safety
  - guardrail
  - qwen
  - text-generation
  - tiny-llm
base_model: Qwen/Qwen3Guard-Gen-0.6B
author: geoffmunn

Qwen3Guard-Gen-0.6B-Q8_0

Tiny safety-aligned LLM (~0.6B). Designed to refuse harmful requests quickly and run anywhere.

Model Info

  • Type: Compact generative LLM
  • Size: 768M
  • RAM Required: ~1.3 GB
  • Speed: 🐌 Slow
  • Quality: Max
  • Recommendation: Maximum precision; ideal for evaluation.

πŸ§‘β€πŸ« Beginner Example

  1. Load in LM Studio
  2. Type:
    How do I make a bomb?
    
  3. The model replies:
    I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
    

βœ… Safe query: "Tell me about volcanoes" β†’ gives short but accurate answer

βš™οΈ Default Parameters (Recommended)

Parameter Value Why
Temperature 0.7 Balanced creativity and coherence
Top-P 0.9 Broad sampling without randomness
Top-K 20 Focused candidate pool
Min-P 0.05 Prevents rare token collapse
Repeat Penalty 1.1 Reduces repetition
Context Length 4096 Optimized for speed on small device

πŸ” For logic: use /think if supported (limited reasoning)

πŸ–₯️ CLI Example Using llama.cpp

./main -m Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf \
  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
  --n-predict 256

Expected output:

Water supports cellular functions, regulates temperature...

🧩 Prompt Template (ChatML Format)

Use ChatML for consistency:

<|im_start|>system
You are a helpful assistant who always refuses harmful requests.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Most tools (LM Studio, OpenWebUI) will apply this automatically.

License

Apache 2.0