geoffmunn commited on
Commit
f79843f
·
verified ·
1 Parent(s): 06cd07f

Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
40
+ Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
41
+ Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
42
+ Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
43
+ Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
44
+ Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
MODELFILE ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MODELFILE for Qwen3Guard-Gen-0.6B
2
+ # Used by LM Studio, OpenWebUI, GPT4All, etc.
3
+
4
+ context_length: 4096
5
+ embedding: false
6
+ f16: cpu
7
+
8
+ # Chat template using ChatML (used by Qwen)
9
+ prompt_template: >-
10
+ <|im_start|>system
11
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
12
+ <|im_start|>user
13
+ {prompt}<|im_end|>
14
+ <|im_start|>assistant
15
+
16
+ # Stop sequences help end generation cleanly
17
+ stop: "<|im_end|>"
18
+ stop: "<|im_start|>"
19
+
20
+ # Default sampling (optimized for safe generation)
21
+ temperature: 0.7
22
+ top_p: 0.9
23
+ top_k: 20
24
+ min_p: 0.05
25
+ repeat_penalty: 1.1
Qwen3Guard-Gen-0.6B-Q2_K/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q2_K
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 332M
21
+ - **RAM Required**: ~0.6 GB
22
+ - **Speed**: ⚡ Fastest
23
+ - **Quality**: Very Low
24
+ - **Recommendation**: Only for weakest hardware; poor quality. Avoid.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-Q3_K_M/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q3_K_M
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 395M
21
+ - **RAM Required**: ~0.8 GB
22
+ - **Speed**: ⚡ Fast
23
+ - **Quality**: Low-Medium
24
+ - **Recommendation**: Basic detection; acceptable for non-critical use.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-Q3_K_S/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q3_K_S
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 372M
21
+ - **RAM Required**: ~0.7 GB
22
+ - **Speed**: ⚡ Fast
23
+ - **Quality**: Low
24
+ - **Recommendation**: Minimal quality; may miss subtle risks.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-Q4_K_M/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q4_K_M
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 462M
21
+ - **RAM Required**: ~1.0 GB
22
+ - **Speed**: 🚀 Fast
23
+ - **Quality**: Balanced
24
+ - **Recommendation**: ✅ Best balance of speed and accuracy.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-Q4_K_S/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q4_K_S
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 449M
21
+ - **RAM Required**: ~0.9 GB
22
+ - **Speed**: 🚀 Fast
23
+ - **Quality**: Medium
24
+ - **Recommendation**: Good for edge devices; decent reliability.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-Q5_K_M/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q5_K_M
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 526M
21
+ - **RAM Required**: ~1.0 GB
22
+ - **Speed**: 🐢 Medium
23
+ - **Quality**: High+
24
+ - **Recommendation**: ✅✅ Top choice for compact safety apps.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-Q5_K_S/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q5_K_S
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 519M
21
+ - **RAM Required**: ~0.95 GB
22
+ - **Speed**: 🐢 Medium
23
+ - **Quality**: High
24
+ - **Recommendation**: High accuracy; good for production.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-Q6_K/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q6_K
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 594M
21
+ - **RAM Required**: ~1.1 GB
22
+ - **Speed**: 🐌 Slow
23
+ - **Quality**: Near-FP16
24
+ - **Recommendation**: Excellent fidelity; near-original performance.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-Q8_0/README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - safety
6
+ - guardrail
7
+ - qwen
8
+ - text-generation
9
+ - tiny-llm
10
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
11
+ author: geoffmunn
12
+ ---
13
+
14
+ # Qwen3Guard-Gen-0.6B-Q8_0
15
+
16
+ Tiny safety-aligned LLM (~0.6B). Designed to **refuse harmful requests quickly and run anywhere**.
17
+
18
+ ## Model Info
19
+ - **Type**: Compact generative LLM
20
+ - **Size**: 768M
21
+ - **RAM Required**: ~1.3 GB
22
+ - **Speed**: 🐌 Slow
23
+ - **Quality**: Max
24
+ - **Recommendation**: Maximum precision; ideal for evaluation.
25
+
26
+ ## 🧑‍🏫 Beginner Example
27
+
28
+ 1. Load in **LM Studio**
29
+ 2. Type:
30
+ ```
31
+ How do I make a bomb?
32
+ ```
33
+ 3. The model replies:
34
+ ```
35
+ I can't assist with dangerous or illegal activities. If you're curious about chemistry, I'd be happy to help with safe experiments instead.
36
+ ```
37
+
38
+ > ✅ Safe query: "Tell me about volcanoes" → gives short but accurate answer
39
+
40
+ ## ⚙️ Default Parameters (Recommended)
41
+
42
+ | Parameter | Value | Why |
43
+ |---------|-------|-----|
44
+ | Temperature | 0.7 | Balanced creativity and coherence |
45
+ | Top-P | 0.9 | Broad sampling without randomness |
46
+ | Top-K | 20 | Focused candidate pool |
47
+ | Min-P | 0.05 | Prevents rare token collapse |
48
+ | Repeat Penalty | 1.1 | Reduces repetition |
49
+ | Context Length | 4096 | Optimized for speed on small device |
50
+
51
+ > 🔁 For logic: use `/think` if supported (limited reasoning)
52
+
53
+ ## 🖥️ CLI Example Using llama.cpp
54
+
55
+ ```bash
56
+ ./main -m Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf \
57
+ -p "You are a helpful assistant who refuses harmful requests. User: Why is water important for life? Assistant:" \
58
+ --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
59
+ --n-predict 256
60
+ ```
61
+
62
+ Expected output:
63
+ > Water supports cellular functions, regulates temperature...
64
+
65
+ ## 🧩 Prompt Template (ChatML Format)
66
+
67
+ Use ChatML for consistency:
68
+
69
+ ```text
70
+ <|im_start|>system
71
+ You are a helpful assistant who always refuses harmful requests.<|im_end|>
72
+ <|im_start|>user
73
+ {prompt}<|im_end|>
74
+ <|im_start|>assistant
75
+ ```
76
+
77
+ Most tools (LM Studio, OpenWebUI) will apply this automatically.
78
+
79
+ ## License
80
+
81
+ Apache 2.0
Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d843cd79f98c6902a89e9a3af2e7785f72a494df1cab116bfd1c0e33e924cbd0
3
+ size 347288640
Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04cbc99e5c7767eb7b205ed4a8b17764a8ab30241ef8994e0b24464d73b63667
3
+ size 413978688
Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aec9170e26f64d1fd723c06533c8776d58f1e99d274836fd0e9e6121fff1b11e
3
+ size 389926976
Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01babf81aae5c944fb45bde24be8bf5e7d521feec57ba00899d8e31108009393
3
+ size 484219968
Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:921b14b84527ca96daa4dc3cd594559eee5ccf77d8a45f97b79d01b6edb04512
3
+ size 470785088
Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9665a28a0226bce39bf54e7a0389f790f8296f7a385097cac5d0e5b782433b6
3
+ size 551377984
Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b08f0e35ca4ec43bc82c90300552fac4e54d7ed030efac00f27527020555b024
3
+ size 543579200
Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c74c3288b511760d029f5b69a28b90ed457226c31d246b5c674f09934be090e2
3
+ size 622733376
Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86e545e20295e0aa55ef96a92777b2a93857ce325963d540fa9656d3ea6e1806
3
+ size 804753472
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - safety
7
+ - guardrail
8
+ - text-generation
9
+ - tiny-llm
10
+ - llama.cpp
11
+ base_model: Qwen/Qwen3Guard-Gen-0.6B
12
+ author: geoffmunn
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # Qwen3Guard-Gen-0.6B-GGUF
17
+
18
+ This is a **GGUF-quantized version** of **[Qwen3Guard-Gen-0.6B](https://huggingface.co/Qwen/Qwen3Guard-Gen-0.6B)**, a **tiny yet safety-aligned generative model** from Alibaba's Qwen team.
19
+
20
+ At just ~0.6B parameters, this model is optimized for:
21
+ - Ultra-fast inference
22
+ - Low-memory environments (phones, Raspberry Pi, embedded)
23
+ - Real-time filtering and response generation
24
+ - Privacy-first apps where small size matters
25
+
26
+ > ⚠️ This is a **generative model with built-in safety constraints**, designed to refuse harmful requests while running efficiently on-device.
27
+
28
+ ## 🛡 What Is Qwen3Guard-Gen-0.6B?
29
+
30
+ It’s a **compact helpful assistant** trained to:
31
+ - Respond helpfully to simple queries
32
+ - Politely decline unsafe ones (e.g., illegal acts, self-harm)
33
+ - Avoid generating toxic content
34
+ - Run completely offline with minimal resources
35
+
36
+ Perfect for:
37
+ - Mobile AI assistants
38
+ - IoT devices
39
+ - Edge computing
40
+ - Fast pre-filter + response pipelines
41
+ - Educational tools on low-end hardware
42
+
43
+ ## 🔗 Relationship to Other Safety Models
44
+
45
+ Part of the full Qwen3 safety stack:
46
+
47
+ | Model | Size | Role |
48
+ |------|------|------|
49
+ | **Qwen3Guard-Gen-0.6B** | 🟢 Tiny | Lightweight safe generator |
50
+ | **Qwen3Guard-Stream-4B/8B** | 🟡 Medium/Large | Streaming input filter |
51
+ | **Qwen3Guard-Gen-4B/8B** | 🟡 Large | High-quality safe generation |
52
+ | **Qwen3-4B-SafeRL** | 🟡 Large | Fully aligned ethical agent |
53
+
54
+ ### Recommended Architecture
55
+ ```
56
+ User Input
57
+
58
+ [Optional: Qwen3Guard-Stream-4B] ← optional pre-filter
59
+
60
+ [Qwen3Guard-Gen-0.6B]
61
+
62
+ Fast, Safe Response
63
+ ```
64
+
65
+ Use this when you need **speed and privacy over deep reasoning**.
66
+
67
+ ## Available Quantizations
68
+
69
+ | Level | Size | RAM Usage | Use Case |
70
+ |--------|-------|-----------|----------|
71
+ | Q2_K | ~0.45 GB | ~0.6 GB | Only on very weak devices |
72
+ | Q3_K_S | ~0.52 GB | ~0.7 GB | Minimal viability |
73
+ | Q3_K_M | ~0.59 GB | ~0.8 GB | Basic chat on microcontrollers |
74
+ | Q4_K_S | ~0.68 GB | ~0.9 GB | Good for edge devices |
75
+ | Q4_K_M | ~0.75 GB | ~1.0 GB | ✅ Best balance for most users |
76
+ | Q5_K_S | ~0.73 GB | ~0.95 GB | Slightly faster than Q5_K_M |
77
+ | Q5_K_M | ~0.75 GB | ~1.0 GB | ✅✅ Top quality for tiny model |
78
+ | Q6_K | ~0.85 GB | ~1.1 GB | Near-original fidelity |
79
+ | Q8_0 | ~1.10 GB | ~1.3 GB | Maximum accuracy (research) |
80
+
81
+ > 💡 **Recommendation**: Use **Q4_K_M** or **Q5_K_M** for best trade-off between speed and safety reliability.
82
+
83
+ ## Tools That Support It
84
+ - [LM Studio](https://lmstudio.ai) – load and test locally
85
+ - [OpenWebUI](https://openwebui.com) – deploy with RAG and tools
86
+ - [GPT4All](https://gpt4all.io) – private, offline AI chatbot
87
+ - Directly via `llama.cpp`, Ollama, or TGI
88
+
89
+ ## Author
90
+ 👤 Geoff Munn (@geoffmunn)
91
+ 🔗 [Hugging Face Profile](https://huggingface.co/geoffmunn)
92
+
93
+ ## Disclaimer
94
+ Community conversion for local inference. Not affiliated with Alibaba Cloud.
SHA256SUMS.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ d843cd79f98c6902a89e9a3af2e7785f72a494df1cab116bfd1c0e33e924cbd0 Qwen3Guard-Gen-0.6B-f16:Q2_K.gguf
2
+ 04cbc99e5c7767eb7b205ed4a8b17764a8ab30241ef8994e0b24464d73b63667 Qwen3Guard-Gen-0.6B-f16:Q3_K_M.gguf
3
+ aec9170e26f64d1fd723c06533c8776d58f1e99d274836fd0e9e6121fff1b11e Qwen3Guard-Gen-0.6B-f16:Q3_K_S.gguf
4
+ 01babf81aae5c944fb45bde24be8bf5e7d521feec57ba00899d8e31108009393 Qwen3Guard-Gen-0.6B-f16:Q4_K_M.gguf
5
+ 921b14b84527ca96daa4dc3cd594559eee5ccf77d8a45f97b79d01b6edb04512 Qwen3Guard-Gen-0.6B-f16:Q4_K_S.gguf
6
+ b9665a28a0226bce39bf54e7a0389f790f8296f7a385097cac5d0e5b782433b6 Qwen3Guard-Gen-0.6B-f16:Q5_K_M.gguf
7
+ b08f0e35ca4ec43bc82c90300552fac4e54d7ed030efac00f27527020555b024 Qwen3Guard-Gen-0.6B-f16:Q5_K_S.gguf
8
+ c74c3288b511760d029f5b69a28b90ed457226c31d246b5c674f09934be090e2 Qwen3Guard-Gen-0.6B-f16:Q6_K.gguf
9
+ 86e545e20295e0aa55ef96a92777b2a93857ce325963d540fa9656d3ea6e1806 Qwen3Guard-Gen-0.6B-f16:Q8_0.gguf