geoffmunn commited on Oct 31

Commit

f79843f

verified ·

1 Parent(s): 06cd07f

Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload

Browse files

Files changed (22) hide show

.gitattributes +9 -0
MODELFILE +25 -0
Qwen3Guard-Gen-0.6B-Q2_K/README.md +81 -0
Qwen3Guard-Gen-0.6B-Q3_K_M/README.md +81 -0
Qwen3Guard-Gen-0.6B-Q3_K_S/README.md +81 -0
Qwen3Guard-Gen-0.6B-Q4_K_M/README.md +81 -0
Qwen3Guard-Gen-0.6B-Q4_K_S/README.md +81 -0
Qwen3Guard-Gen-0.6B-Q5_K_M/README.md +81 -0
Qwen3Guard-Gen-0.6B-Q5_K_S/README.md +81 -0
Qwen3Guard-Gen-0.6B-Q6_K/README.md +81 -0
Qwen3Guard-Gen-0.6B-Q8_0/README.md +81 -0
Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf +3 -0
Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf +3 -0
Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf +3 -0
Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf +3 -0
Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf +3 -0
Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf +3 -0
Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf +3 -0
Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf +3 -0
Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf +3 -0
README.md +94 -0
SHA256SUMS.txt +9 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

MODELFILE ADDED Viewed

	@@ -0,0 +1,25 @@

+# MODELFILE for Qwen3Guard-Gen-0.6B
+# Used by LM Studio, OpenWebUI, GPT4All, etc.
+context_length: 4096
+embedding: false
+f16: cpu
+# Chat template using ChatML (used by Qwen)
+prompt_template: >-
+     <|im_start|>system
+     You are a helpful assistant who always refuses harmful requests.<|im_end|>
+     <|im_start|>user
+     {prompt}<|im_end|>
+     <|im_start|>assistant
+# Stop sequences help end generation cleanly
+stop: "<|im_end|>"
+stop: "<|im_start|>"
+# Default sampling (optimized for safe generation)
+temperature: 0.7
+top_p: 0.9
+top_k: 20
+min_p: 0.05
+repeat_penalty: 1.1

Qwen3Guard-Gen-0.6B-Q2_K/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q2_K
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 332M
+- **RAM Required**: ~0.6 GB
+- **Speed**: ⚡ Fastest
+- **Quality**: Very Low
+- **Recommendation**: Only for weakest hardware; poor quality. Avoid.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-Q3_K_M/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q3_K_M
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 395M
+- **RAM Required**: ~0.8 GB
+- **Speed**: ⚡ Fast
+- **Quality**: Low-Medium
+- **Recommendation**: Basic detection; acceptable for non-critical use.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-Q3_K_S/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q3_K_S
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 372M
+- **RAM Required**: ~0.7 GB
+- **Speed**: ⚡ Fast
+- **Quality**: Low
+- **Recommendation**: Minimal quality; may miss subtle risks.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-Q4_K_M/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q4_K_M
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 462M
+- **RAM Required**: ~1.0 GB
+- **Speed**: 🚀 Fast
+- **Quality**: Balanced
+- **Recommendation**: ✅ Best balance of speed and accuracy.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-Q4_K_S/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q4_K_S
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 449M
+- **RAM Required**: ~0.9 GB
+- **Speed**: 🚀 Fast
+- **Quality**: Medium
+- **Recommendation**: Good for edge devices; decent reliability.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-Q5_K_M/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q5_K_M
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 526M
+- **RAM Required**: ~1.0 GB
+- **Speed**: 🐢 Medium
+- **Quality**: High+
+- **Recommendation**: ✅✅ Top choice for compact safety apps.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-Q5_K_S/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q5_K_S
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 519M
+- **RAM Required**: ~0.95 GB
+- **Speed**: 🐢 Medium
+- **Quality**: High
+- **Recommendation**: High accuracy; good for production.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-Q6_K/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q6_K
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 594M
+- **RAM Required**: ~1.1 GB
+- **Speed**: 🐌 Slow
+- **Quality**: Near-FP16
+- **Recommendation**: Excellent fidelity; near-original performance.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-Q8_0/README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - safety
+  - guardrail
+  - qwen
+  - text-generation
+  - tiny-llm
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+---
+# Qwen3Guard-Gen-0.6B-Q8_0
+Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
+## Model Info
+- **Type**: Compact generative LLM
+- **Size**: 768M
+- **RAM Required**: ~1.3 GB
+- **Speed**: 🐌 Slow
+- **Quality**: Max
+- **Recommendation**: Maximum precision; ideal for evaluation.
+## 🧑‍🏫 Beginner Example
+1. Load in **LM Studio**
+2. Type:
+   ```
+   How do I make a bomb?
+   ```
+3. The model replies:
+   ```
+   I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
+   ```
+> ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
+## ⚙️ Default Parameters (Recommended)
+| Parameter | Value | Why |
+|---------|-------|-----|
+| Temperature | 0.7 | Balanced creativity and coherence |
+| Top-P | 0.9 | Broad sampling without randomness |
+| Top-K | 20 | Focused candidate pool |
+| Min-P | 0.05 | Prevents rare token collapse |
+| Repeat Penalty | 1.1 | Reduces repetition |
+| Context Length | 4096 | Optimized for speed on small device |
+> 🔁 For logic: use `/think` if supported (limited reasoning)
+## 🖥️ CLI Example Using llama.cpp
+```bash
+./main -m Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf \
+  -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
+  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
+  --n-predict 256
+```
+Expected output:
+> Water supports cellular functions, regulates temperature...
+## 🧩 Prompt Template (ChatML Format)
+Use ChatML for consistency:
+```text
+<|im_start|>system
+You are a helpful assistant who always refuses harmful requests.<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+Most tools (LM Studio, OpenWebUI) will apply this automatically.
+## License
+Apache 2.0

Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d843cd79f98c6902a89e9a3af2e7785f72a494df1cab116bfd1c0e33e924cbd0
+size 347288640

Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:04cbc99e5c7767eb7b205ed4a8b17764a8ab30241ef8994e0b24464d73b63667
+size 413978688

Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aec9170e26f64d1fd723c06533c8776d58f1e99d274836fd0e9e6121fff1b11e
+size 389926976

Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:01babf81aae5c944fb45bde24be8bf5e7d521feec57ba00899d8e31108009393
+size 484219968

Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:921b14b84527ca96daa4dc3cd594559eee5ccf77d8a45f97b79d01b6edb04512
+size 470785088

Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b9665a28a0226bce39bf54e7a0389f790f8296f7a385097cac5d0e5b782433b6
+size 551377984

Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b08f0e35ca4ec43bc82c90300552fac4e54d7ed030efac00f27527020555b024
+size 543579200

Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c74c3288b511760d029f5b69a28b90ed457226c31d246b5c674f09934be090e2
+size 622733376

Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:86e545e20295e0aa55ef96a92777b2a93857ce325963d540fa9656d3ea6e1806
+size 804753472

README.md ADDED Viewed

	@@ -0,0 +1,94 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - safety
+  - guardrail
+  - text-generation
+  - tiny-llm
+  - llama.cpp
+base_model: Qwen/Qwen3Guard-Gen-0.6B
+author: geoffmunn
+pipeline_tag: text-generation
+---
+# Qwen3Guard-Gen-0.6B-GGUF
+This is a **GGUF-quantized version** of **[Qwen3Guard-Gen-0.6B](https://huggingface.co/Qwen/Qwen3Guard-Gen-0.6B)**, a **tiny yet safety-aligned generative model** from Alibaba's Qwen team.
+At just ~0.6B parameters, this model is optimized for:
+- Ultra-fast inference
+- Low-memory environments (phones, Raspberry Pi, embedded)
+- Real-time filtering and response generation
+- Privacy-first apps where small size matters
+> ⚠️ This is a **generative model with built-in safety constraints**, designed to refuse harmful requests while running efficiently on-device.
+## 🛡 What Is Qwen3Guard-Gen-0.6B?
+It’s a **compact helpful assistant** trained to:
+- Respond helpfully to simple queries
+- Politely decline unsafe ones (e.g., illegal acts, self-harm)
+- Avoid generating toxic content
+- Run completely offline with minimal resources
+Perfect for:
+- Mobile AI assistants
+- IoT devices
+- Edge computing
+- Fast pre-filter + response pipelines
+- Educational tools on low-end hardware
+## 🔗 Relationship to Other Safety Models
+Part of the full Qwen3 safety stack:
+| Model | Size | Role |
+|------|------|------|
+| **Qwen3Guard-Gen-0.6B** | 🟢 Tiny | Lightweight safe generator |
+| **Qwen3Guard-Stream-4B/8B** | 🟡 Medium/Large | Streaming input filter |
+| **Qwen3Guard-Gen-4B/8B** | 🟡 Large | High-quality safe generation |
+| **Qwen3-4B-SafeRL** | 🟡 Large | Fully aligned ethical agent |
+### Recommended Architecture
+```
+User Input
+    ↓
+[Optional: Qwen3Guard-Stream-4B] ← optional pre-filter
+    ↓
+[Qwen3Guard-Gen-0.6B]
+    ↓
+Fast, Safe Response
+```
+Use this when you need **speed and privacy over deep reasoning**.
+## Available Quantizations
+| Level   | Size  | RAM Usage | Use Case |
+|--------|-------|-----------|----------|
+| Q2_K   | ~0.45 GB | ~0.6 GB | Only on very weak devices |
+| Q3_K_S | ~0.52 GB | ~0.7 GB | Minimal viability |
+| Q3_K_M | ~0.59 GB | ~0.8 GB | Basic chat on microcontrollers |
+| Q4_K_S | ~0.68 GB | ~0.9 GB | Good for edge devices |
+| Q4_K_M | ~0.75 GB | ~1.0 GB | ✅ Best balance for most users |
+| Q5_K_S | ~0.73 GB | ~0.95 GB | Slightly faster than Q5_K_M |
+| Q5_K_M | ~0.75 GB | ~1.0 GB | ✅✅ Top quality for tiny model |
+| Q6_K   | ~0.85 GB | ~1.1 GB | Near-original fidelity |
+| Q8_0   | ~1.10 GB | ~1.3 GB | Maximum accuracy (research) |
+> 💡 **Recommendation**: Use **Q4_K_M** or **Q5_K_M** for best trade-off between speed and safety reliability.
+## Tools That Support It
+- [LM Studio](https://lmstudio.ai) – load and test locally
+- [OpenWebUI](https://openwebui.com) – deploy with RAG and tools
+- [GPT4All](https://gpt4all.io) – private, offline AI chatbot
+- Directly via `llama.cpp`, Ollama, or TGI
+## Author
+👤 Geoff Munn (@geoffmunn)
+🔗 [Hugging Face Profile](https://huggingface.co/geoffmunn)
+## Disclaimer
+Community conversion for local inference. Not affiliated with Alibaba Cloud.

SHA256SUMS.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+d843cd79f98c6902a89e9a3af2e7785f72a494df1cab116bfd1c0e33e924cbd0  Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf
+04cbc99e5c7767eb7b205ed4a8b17764a8ab30241ef8994e0b24464d73b63667  Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf
+aec9170e26f64d1fd723c06533c8776d58f1e99d274836fd0e9e6121fff1b11e  Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf
+01babf81aae5c944fb45bde24be8bf5e7d521feec57ba00899d8e31108009393  Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf
+921b14b84527ca96daa4dc3cd594559eee5ccf77d8a45f97b79d01b6edb04512  Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf
+b9665a28a0226bce39bf54e7a0389f790f8296f7a385097cac5d0e5b782433b6  Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf
+b08f0e35ca4ec43bc82c90300552fac4e54d7ed030efac00f27527020555b024  Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf
+c74c3288b511760d029f5b69a28b90ed457226c31d246b5c674f09934be090e2  Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf
+86e545e20295e0aa55ef96a92777b2a93857ce325963d540fa9656d3ea6e1806  Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf