bartowski
/

openai_gpt-oss-20b-GGUF

Text Generation

GGUF

conversational

Model card Files Files and versions

xet

Community

bartowski commited on Aug 11

Commit

e39ba3a

verified ·

1 Parent(s): dff870b

Update README.md

Browse files

Files changed (1) hide show

README.md +34 -27

README.md CHANGED Viewed

@@ -37,36 +37,43 @@ Reasoning: medium
 ## Download a file (not the whole branch) from below:
 | Filename | Quant type | File Size | Split | Description |
 | -------- | ---------- | --------- | ----- | ----------- |
 | [gpt-oss-20b-MXFP4.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-MXFP4.gguf) | MXFP4 | 12.1GB | false | Full MXFP4 weights, *recommended* for this model. |
-| [gpt-oss-20b-bf16.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-bf16.gguf) | bf16 | 13.79GB | false | Full BF16 weights. |
-| [gpt-oss-20b-Q8_0.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q8_0.gguf) | Q8_0 | 12.11GB | false | Extremely high quality, generally unneeded but max available quant. |
-| [gpt-oss-20b-Q6_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q6_K_L.gguf) | Q6_K_L | 12.04GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
-| [gpt-oss-20b-Q6_K.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q6_K.gguf) | Q6_K | 12.04GB | false | Very high quality, near perfect, *recommended*. |
-| [gpt-oss-20b-Q5_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q5_K_L.gguf) | Q5_K_L | 11.91GB | false | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
-| [gpt-oss-20b-Q4_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_K_L.gguf) | Q4_K_L | 11.89GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
-| [gpt-oss-20b-Q2_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q2_K_L.gguf) | Q2_K_L | 11.85GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
-| [gpt-oss-20b-Q3_K_XL.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q3_K_XL.gguf) | Q3_K_XL | 11.78GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
-| [gpt-oss-20b-Q5_K_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q5_K_M.gguf) | Q5_K_M | 11.73GB | false | High quality, *recommended*. |
-| [gpt-oss-20b-Q5_K_S.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q5_K_S.gguf) | Q5_K_S | 11.72GB | false | High quality, *recommended*. |
-| [gpt-oss-20b-Q4_K_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_K_M.gguf) | Q4_K_M | 11.67GB | false | Good quality, default size for most use cases, *recommended*. |
-| [gpt-oss-20b-Q4_K_S.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_K_S.gguf) | Q4_K_S | 11.67GB | false | Slightly lower quality with more space savings, *recommended*. |
-| [gpt-oss-20b-Q4_1.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_1.gguf) | Q4_1 | 11.59GB | false | Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon. |
-| [gpt-oss-20b-IQ4_NL.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ4_NL.gguf) | IQ4_NL | 11.56GB | false | Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference. |
-| [gpt-oss-20b-IQ4_XS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ4_XS.gguf) | IQ4_XS | 11.56GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
-| [gpt-oss-20b-Q3_K_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q3_K_M.gguf) | Q3_K_M | 11.56GB | false | Low quality. |
-| [gpt-oss-20b-IQ3_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ3_M.gguf) | IQ3_M | 11.56GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
-| [gpt-oss-20b-IQ3_XS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ3_XS.gguf) | IQ3_XS | 11.56GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
-| [gpt-oss-20b-IQ3_XXS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ3_XXS.gguf) | IQ3_XXS | 11.56GB | false | Lower quality, new method with decent performance, comparable to Q3 quants. |
-| [gpt-oss-20b-Q2_K.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q2_K.gguf) | Q2_K | 11.56GB | false | Very low quality but surprisingly usable. |
-| [gpt-oss-20b-Q3_K_S.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q3_K_S.gguf) | Q3_K_S | 11.55GB | false | Low quality, not recommended. |
-| [gpt-oss-20b-IQ2_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ2_M.gguf) | IQ2_M | 11.55GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
-| [gpt-oss-20b-IQ2_S.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ2_S.gguf) | IQ2_S | 11.55GB | false | Low quality, uses SOTA techniques to be usable. |
-| [gpt-oss-20b-Q4_0.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_0.gguf) | Q4_0 | 11.52GB | false | Legacy format, offers online repacking for ARM and AVX CPU inference. |
-| [gpt-oss-20b-IQ2_XS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ2_XS.gguf) | IQ2_XS | 11.51GB | false | Low quality, uses SOTA techniques to be usable. |
-| [gpt-oss-20b-IQ2_XXS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ2_XXS.gguf) | IQ2_XXS | 11.51GB | false | Very low quality, uses SOTA techniques to be usable. |
-| [gpt-oss-20b-Q3_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q3_K_L.gguf) | Q3_K_L | 11.49GB | false | Lower quality but usable, good for low RAM availability. |
 ## Embed/output weights

 ## Download a file (not the whole branch) from below:
+Use this one:
 | Filename | Quant type | File Size | Split | Description |
 | -------- | ---------- | --------- | ----- | ----------- |
 | [gpt-oss-20b-MXFP4.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-MXFP4.gguf) | MXFP4 | 12.1GB | false | Full MXFP4 weights, *recommended* for this model. |
+The reason is, the FFN (feed forward networks) of gpt-oss do not behave nicely when quantized to anything other than MXFP4, so they are kept at that level for everything.
+The rest of these are provided for your own interest in case you feel like experimenting, but the size savings is basically non-existent so I would not recommend running them, they are provided simply for show:
+| Filename | Quant type | File Size | Split | Description |
+| -------- | ---------- | --------- | ----- | ----------- |
+| [gpt-oss-20b-Q6_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q6_K_L.gguf) | Q6_K_L | 12.04GB | false | Uses Q8_0 for embed and output weights. Q6_K with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q6_K.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q6_K.gguf) | Q6_K | 12.04GB | false | Q6_K with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q5_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q5_K_L.gguf) | Q5_K_L | 11.91GB | false | Uses Q8_0 for embed and output weights. Q5_K with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q4_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_K_L.gguf) | Q4_K_L | 11.89GB | false | Uses Q8_0 for embed and output weights. Q4_K with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q2_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q2_K_L.gguf) | Q2_K_L | 11.85GB | false | Uses Q8_0 for embed and output weights. Q2_K with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q3_K_XL.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q3_K_XL.gguf) | Q3_K_XL | 11.78GB | false | Uses Q8_0 for embed and output weights. Q3_K_L with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q5_K_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q5_K_M.gguf) | Q5_K_M | 11.73GB | false | Q5_K_M with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q5_K_S.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q5_K_S.gguf) | Q5_K_S | 11.72GB | false | Q5_K_S with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q4_K_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_K_M.gguf) | Q4_K_M | 11.67GB | false | Q4_K_M with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q4_K_S.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_K_S.gguf) | Q4_K_S | 11.67GB | false | Q4_K_S with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q4_1.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_1.gguf) | Q4_1 | 11.59GB | false | Q4_1 with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ4_NL.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ4_NL.gguf) | IQ4_NL | 11.56GB | false | IQ4_NL with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ4_XS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ4_XS.gguf) | IQ4_XS | 11.56GB | false | IQ4_XS with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q3_K_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q3_K_M.gguf) | Q3_K_M | 11.56GB | false | Q3_K_M with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ3_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ3_M.gguf) | IQ3_M | 11.56GB | false | IQ3_M with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ3_XS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ3_XS.gguf) | IQ3_XS | 11.56GB | false | IQ3_XS with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ3_XXS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ3_XXS.gguf) | IQ3_XXS | 11.56GB | false | IQ3_XXS with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q2_K.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q2_K.gguf) | Q2_K | 11.56GB | false | Q2_K with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q3_K_S.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q3_K_S.gguf) | Q3_K_S | 11.55GB | false | Q3_K_S with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ2_M.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ2_M.gguf) | IQ2_M | 11.55GB | false | IQ2_M with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ2_S.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ2_S.gguf) | IQ2_S | 11.55GB | false | IQ2_S with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q4_0.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q4_0.gguf) | Q4_0 | 11.52GB | false | Q4_0 with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ2_XS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ2_XS.gguf) | IQ2_XS | 11.51GB | false | IQ2_XS with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-IQ2_XXS.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-IQ2_XXS.gguf) | IQ2_XXS | 11.51GB | false | IQ2_XXS with all FFN kept at MXFP4_MOE. |
+| [gpt-oss-20b-Q3_K_L.gguf](https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF/blob/main/openai_gpt-oss-20b-Q3_K_L.gguf) | Q3_K_L | 11.49GB | false | Q3_K_L with all FFN kept at MXFP4_MOE. |
 ## Embed/output weights