Nanbeige4.1-3B-heretic

Abliterated (uncensored) version of Nanbeige/Nanbeige4.1-3B, created using Heretic and converted to GGUF.

Abliteration Quality

Iterative multi-round abliteration with KL-constrained trial selection:

Round	Refusals	KL Divergence
Baseline	97/100	-
Round 1	86/100	0.0001
Round 2	49/100	0.0002
Round 3	3/100	0.0010

Lower refusals = fewer refused prompts. Lower KL divergence = closer to original model behavior.

Available Quantizations

Quantization	File	Size
BF16	Nanbeige4.1-3B-heretic-BF16.gguf	7.33 GB
Q8_0	Nanbeige4.1-3B-heretic-Q8_0.gguf	3.90 GB

Usage with Ollama

ollama run hf.co/ThalisAI/Nanbeige4.1-3B-heretic:Q8_0

Note: This model uses Llama architecture with ChatML prompt format and <think>/</think> reasoning tokens. The included Modelfile sets the correct chat template, stop tokens, and recommended parameters (temperature 0.6, top_p 0.95). The default system prompt is overridden from Chinese to English.

bf16 Weights

Full-precision bf16 weights are available in the bf16/ subfolder for use with Transformers or further quantization.

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ThalisAI/Nanbeige4.1-3B-heretic",
    subfolder="bf16",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("ThalisAI/Nanbeige4.1-3B-heretic", subfolder="bf16")

messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

LoRA Adapter

The abliteration LoRA adapter is available in the lora/ subfolder. This can be applied to the original base model to reproduce the abliteration without the full merged weights:

from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Nanbeige/Nanbeige4.1-3B", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(base_model, "ThalisAI/Nanbeige4.1-3B-heretic", subfolder="lora")

About

This model was processed by the Apostate automated abliteration pipeline:

The source model was loaded in bf16
Heretic's optimization-based abliteration was applied iteratively over 3 rounds to remove refusal behavior while minimizing KL divergence
The merged model was converted to GGUF format using llama.cpp
Multiple quantization levels were generated

The abliteration process uses directional ablation to remove the model's refusal directions while minimizing KL divergence from the original model's behavior on harmless prompts.