Qwen2.5-Coder-32B-Instruct-heretic
Abliterated (uncensored) version of Qwen/Qwen2.5-Coder-32B-Instruct,
created using Heretic and converted to GGUF.
Abliteration Quality
| Metric |
Value |
| Refusals |
4/100 |
| KL Divergence |
0.0728 |
| Rounds |
2 |
Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior.
Available Quantizations
Usage with Ollama
ollama run hf.co/ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic:Q8_0
bf16 Weights
The full bf16 abliterated weights are available in the bf16/ subdirectory of this repository.
Usage with Transformers
The bf16 weights in the bf16/ subdirectory can be loaded directly with Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ThalisAI/Qwen2.5-Coder-32B-Instruct-heretic"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16")
model = AutoModelForCausalLM.from_pretrained(
model_id, subfolder="bf16", torch_dtype="auto", device_map="auto"
)
messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
About
This model was processed by the Apostate automated abliteration pipeline:
- The source model was loaded in bf16
- Heretic's optimization-based abliteration was applied to remove refusal behavior
- The merged model was converted to GGUF format using llama.cpp
- Multiple quantization levels were generated
The abliteration process uses directional ablation to remove the model's refusal directions
while minimizing KL divergence from the original model's behavior on harmless prompts.