Elbaz-Prime-Intellect-3_Prism_Abliterated

abliterated

A Smarter, Accessible abliterated (uncensored) version of INTELLECT-3 with guardrails removed

ELBAZ PRISM ABLITERATED PRIME INTELLECT-3

INTELLECT-3: A 100B+ MoE trained with large-scale RL

Trained with prime-rl and verifiers
Environments released on Environments Hub
Read the Blog & Technical Report
X | Discord | Prime Intellect Platform

Introduction

INTELLECT-3 is a 106B (A12B) parameter Mixture-of-Experts reasoning model post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL).

Model Description

This model is an abliterated version of PrimeIntellect/INTELLECT-3 that has had its refusal mechanisms removed using PRISM (Projected Refusal Isolation via Subspace Modification). This state-of-the-art methodology combines multiple principled techniques for effective refusal removal while preserving—and even improving—model capabilities. The model will respond to prompts that the original model would refuse.

INTELLECT-3 is a 106B parameter Mixture-of-Experts reasoning model with 12B active parameters and 128 MoE experts, trained through decentralized collaborative training.

Motivation

This project exists as research and development experimentation into understanding how large language models encode and enforce refusal behaviors. This work contributes to the broader AI safety research community by providing empirical data on refusal mechanism localization and the tradeoffs between safety and capability in large models.

Author

Eric Elbaz (Ex0bit)

Model Tree

PrimeIntellect/INTELLECT-3 (Base Model - BF16)
└── Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated (This Model)
    └── Elbaz-Prime-Intellect-3_Prism_Abliterated-IQ4_XS.gguf

INTELLECT Family

Model	Parameters	Type	Link
INTELLECT-3	106B (12B active)	Reasoning MoE	PrimeIntellect/INTELLECT-3
INTELLECT-3-FP8	106B (12B active)	Reasoning MoE (FP8)	PrimeIntellect/INTELLECT-3-FP8

Available Quantizations

Quantization	Size	BPW	Min VRAM	Recommended VRAM	Notes
IQ4_XS	55 GB	4.39	64 GB	80 GB	Importance-weighted 4-bit, excellent quality

The IQ4_XS quantization uses importance-weighted quantization which provides better quality than standard Q4 quantizations at similar sizes. Embedding and output layers use Q6_K precision for optimal quality.

Prompt Format

This model uses the ChatML prompt format with thinking/reasoning support:

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{user_prompt}<|im_end|}
<|im_start|>assistant
<think>

Template Structure

Component	Token/Format
System Start	`<\|im_start\|>system`
System End	`<\|im_end\|>`
User Start	`<\|im_start\|>user`
User End	`<\|im_end\|>`
Assistant Start	`<\|im_start\|>assistant`
Thinking Tag	`<think>`
End of Turn	`<\|im_end\|>`

The model will output its reasoning inside <think>...</think> tags before providing the final response.

Technicals

Performance Impact

Metric	Change
MMLU	+5-8% improvement
Coherence	+5-8% improvement
Refusal Bypass	100% rate

Testing shows that PRISM abliteration can actually improve benchmark performance by 5-8% on MMLU and coherence metrics compared to the base model, likely due to reduced over-refusal interference with legitimate reasoning tasks.

Quick Start

Using with llama.cpp

# Download the model
huggingface-cli download Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated \
    Elbaz-Prime-Intellect-3_Prism_Abliterated-IQ4_XS.gguf \
    --local-dir .

# Run inference with full prompt format
./llama-cli -m Elbaz-Prime-Intellect-3_Prism_Abliterated-IQ4_XS.gguf \
    -p "<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Your prompt here<|im_end|>
<|im_start|>assistant
<think>
" \
    -n 2048 \
    --temp 0.6 \
    -ngl 999

llama.cpp with llama-server

# Start the server
./llama-server -m Elbaz-Prime-Intellect-3_Prism_Abliterated-IQ4_XS.gguf \
    --host 0.0.0.0 \
    --port 8080 \
    -ngl 999 \
    -c 32768

# Example API call
curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Your prompt here"}
        ],
        "temperature": 0.6
    }'

Using with Ollama

# Pull and run directly from Hugging Face (recommended)
ollama pull hf.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated
ollama run hf.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated

Note: The hf.co/ prefix is required to pull from Hugging Face. Requires Ollama 0.3.0+.

🔧 Native Tool Calling with Ollama's `tools` API

Issue: INTELLECT-3 outputs tool calls in XML format (<function=name><parameter=key>value</parameter></function>), but Ollama's parser expects JSON format inside <tool_call> tags. This causes Ollama's tool_calls field to remain empty even when the model outputs valid tool calls.

Workaround: Create a custom model with a template that instructs the model to use JSON format:

Create a file called Modelfile.glm:

FROM hf.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated

# INTELLECT-3 Prism with GLM-style Tool Template
# Uses JSON tool call format that works with Ollama's parser

TEMPLATE """
{{- /* System message at the beginning, if there is .System or .Tools */ -}}
{{- if or .System .Tools }}
<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

You are a helpful assistant with the ability to call tools.

**Available tools:**
{{ range .Tools }}
{{ . }}
{{ end }}

**Tool call format:**
<tool_call>
{"name": "tool_name", "parameters": {"param1": "value1", "param2": "value2"}}
</tool_call>

Important:
- You MUST use JSON format inside <tool_call> tags
- Do your reasoning first, then output the tool call
- If no tool is needed, answer normally
{{- end }}
<|im_end|>
{{- end }}

{{- /* Processing messages */ -}}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}

{{- /* User messages */ -}}
{{- if eq .Role "user" }}
<|im_start|>user
{{ .Content }}<|im_end|>
{{- /* Start model turn if this is the last message */ -}}
{{ if $last }}<|im_start|>assistant
{{ end }}

{{- /* Assistant messages */ -}}
{{- else if eq .Role "assistant" }}
<|im_start|>assistant
{{- if .ToolCalls }}
<tool_call>
{{- range .ToolCalls }}
{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}
{{- end }}
</tool_call>
{{- else }}
{{ .Content }}
{{- end }}
{{- /* End turn if this is not the last message */ -}}
{{- if not $last }}<|im_end|>
{{ end }}

{{- /* Tool results */ -}}
{{- else if eq .Role "tool" }}
<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{- /* Start model turn if this is the last message */ -}}
{{ if $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- end }}
"""

PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER temperature 0.6
PARAMETER num_ctx 32768

Create and run the model:

ollama create elbaz-prism-glm -f Modelfile.glm
ollama run elbaz-prism-glm

Use with Ollama's native tools API:

import ollama

response = ollama.chat(
    model='elbaz-prism-glm',
    messages=[{'role': 'user', 'content': 'What time is it in Tokyo?'}],
    tools=[{
        'type': 'function',
        'function': {
            'name': 'get_time',
            'description': 'Get current time in a timezone',
            'parameters': {
                'type': 'object',
                'properties': {
                    'timezone': {'type': 'string', 'description': 'Timezone like Asia/Tokyo'}
                }
            }
        }
    }]
)

# tool_calls field will now be properly populated!
print(response['message'].get('tool_calls'))

This workaround ensures Ollama's parser correctly populates the tool_calls field in API responses.

Using with LM Studio

Download the GGUF file and place it in your LM Studio models directory
Load the model in LM Studio

Template Compatibility Note: If you encounter Jinja template errors such as:

Unknown test: sequence
Unknown StringValue filter: safe

Go to Settings → Prompt Template and replace the template with this LM Studio-compatible version:

Click to expand full template

{% macro render_extra_keys(json_dict, handled_keys) %}
    {%- if json_dict is mapping %}
        {%- for json_key in json_dict if json_key not in handled_keys %}
            {%- if json_dict[json_key] is mapping or (json_dict[json_key] is iterable and json_dict[json_key] is not string and json_dict[json_key] is not mapping) %}
                {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson) ~ '</' ~ json_key ~ '>' }}
            {%- else %}
                {{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
            {%- endif %}
        {%- endfor %}
    {%- endif %}
{% endmacro %}

{%- if messages[0]["role"] == "system" %}
    {%- set system_message = messages[0]["content"] %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set loop_messages = messages %}
{%- endif %}

{%- if not tools is defined %}
    {%- set tools = [] %}
{%- endif %}

{%- if system_message is defined %}
    {{- "<|im_start|>system\n" + system_message }}
{%- else %}
    {%- if tools is iterable and tools | length > 0 %}
        {{- "<|im_start|>system\nYou are INTELLECT-3, a helpful assistant developed by Prime Intellect, that can interact with a computer to solve tasks." }}
    {%- endif %}
{%- endif %}
{%- if tools is iterable and tools | length > 0 %}
    {{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
    {{- "<tools>" }}
    {%- for tool in tools %}
        {%- if tool.function is defined %}
            {%- set tool = tool.function %}
        {%- endif %}
        {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
        {%- if tool.description is defined %}
            {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
        {%- endif %}
        {{- '\n<parameters>' }}
        {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
            {%- for param_name, param_fields in tool.parameters.properties|items %}
                {{- '\n<parameter>' }}
                {{- '\n<name>' ~ param_name ~ '</name>' }}
                {%- if param_fields.type is defined %}
                    {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
                {%- endif %}
                {%- if param_fields.description is defined %}
                    {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
                {%- endif %}
                {%- set handled_keys = ['name', 'type', 'description'] %}
                {{- render_extra_keys(param_fields, handled_keys) }}
                {{- '\n</parameter>' }}
            {%- endfor %}
        {%- endif %}
        {% set handled_keys = ['type', 'properties'] %}
        {{- render_extra_keys(tool.parameters, handled_keys) }}
        {{- '\n</parameters>' }}
        {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
        {{- render_extra_keys(tool, handled_keys) }}
        {{- '\n</function>' }}
    {%- endfor %}
    {{- "\n</tools>" }}
    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
{%- endif %}
{%- if system_message is defined %}
    {{- '<|im_end|>\n' }}
{%- else %}
    {%- if tools is iterable and tools | length > 0 %}
        {{- '<|im_end|>\n' }}
    {%- endif %}
{%- endif %}
{%- for message in loop_messages %}
    {%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
        {{- '<|im_start|>' + message.role }}
        {%- if message.content is defined and message.content is string %}
            {%- if message.reasoning_content is defined  -%}
                {%- if message.reasoning_content -%}
                    {{ '\n<think>' + message.reasoning_content.strip() + '</think>' }}
                {%- else -%}
                    {{ '\n<think></think>' }}
                {%- endif -%}
            {%- endif -%}
            {{- '\n' + message.content | trim + '\n' }}
        {%- endif %}
        {%- for tool_call in message.tool_calls %}
            {%- if tool_call.function is defined %}
                {%- set tool_call = tool_call.function %}
            {%- endif %}
            {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
            {%- if tool_call.arguments is defined %}
                {%- for args_name, args_value in tool_call.arguments|items %}
                    {{- '<parameter=' + args_name + '>\n' }}
                    {%- set args_value = args_value | tojson if args_value is mapping or (args_value is iterable and args_value is not string and args_value is not mapping) else args_value | string %}
                    {{- args_value }}
                    {{- '\n</parameter>\n' }}
                {%- endfor %}
            {%- endif %}
            {{- '</function>\n</tool_call>' }}
        {%- endfor %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "user" or message.role == "system" or message.role == "assistant" %}
        {{- '<|im_start|>' + message.role }}
        {%- if message.role == "assistant" and message.reasoning_content is defined %}
            {%- if message.reasoning_content -%}
                {{ '\n<think>' + message.reasoning_content.strip() + '</think>' }}
            {%- else -%}
                {{ '\n<think></think>' }}
            {%- endif -%}
            {%- if message.content.strip() -%}
                {{ '\n' + message.content.strip() }}
            {%- endif -%}
        {%- else %}
            {{- '\n' + message.content }}
        {%- endif %}
        {{- '<|im_end|>' + '\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.previtem and loop.previtem.role != "tool" %}
            {{- '<|im_start|>user\n' }}
        {%- endif %}
        {{- '<tool_response>\n' }}
        {{- message.content }}
        {{- '\n</tool_response>\n' }}
        {%- if not loop.last and loop.nextitem.role != "tool" %}
            {{- '<|im_end|>\n' }}
        {%- elif loop.last %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- else %}
        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n<think>' }}
{%- endif %}

This template fixes the following LM Studio incompatibilities:

Replaced is sequence with is iterable and is not string and is not mapping
Removed | safe filter (not supported by LM Studio's Jinja implementation)

The template includes full support for tool calling, reasoning/thinking tags, and the ChatML format.

Using with Transformers (Full Weights)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Your prompt here"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt += "<think>\n"  # Add thinking tag for reasoning

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Method: PRISM (Projected Refusal Isolation via Subspace Modification)

The model was abliterated using PRISM v5 - a state-of-the-art abliteration methodology that combines multiple principled techniques for effective refusal removal while preserving model capabilities.

Core Approach:

Projected Direction Isolation - Computes a cleaner refusal direction by projecting out helpfulness-correlated components, avoiding the typical "helpfulness confound" that degrades standard abliteration
SNR-based Layer Selection - Uses signal-to-noise ratio analysis to identify layers where refusal behavior is most concentrated, rather than arbitrary layer targeting
Dual-Component Modification - Modifies both MLP and Attention pathways to prevent the ~70% self-repair typically seen with single-component approaches
Norm-Preserving Orthogonalization - Maintains original weight matrix norms during modification to preserve model coherence
Thinking Model Support - Specialized handling for reasoning models with extended thinking capabilities

Key Innovation: Unlike conventional abliteration which often produces incoherent outputs, PRISM's preprocessing and calibration techniques maintain grammatical coherence while achieving high bypass rates.

Abliteration Parameters

Parameter	Value
Base Model	PrimeIntellect/INTELLECT-3 (BF16)
Target Layers	12-34 (23 layers)
Peak Layer	25
Weight Scaling	max=4.0, min=0.5 (SNR-calibrated)
Components	Attention + MLP (dual)
Norm Preservation	Enabled
Output Format	IQ4_XS GGUF (55 GB)

Hardware Requirements

Quantization	Min RAM/VRAM	Recommended	Hardware Examples
IQ4_XS	64 GB	80+ GB	Apple M4 Max (128GB), A100 80GB, H100, 2x RTX 4090

Tested Configurations

Hardware	RAM/VRAM	Status
Apple M4 Max (MacBook Pro)	128 GB Unified	✅ Works great

Apple Silicon Note: The model runs well on Apple Silicon Macs with sufficient unified memory. Tested on M4 Max with 128GB - loads fully into memory with room for 32K+ context.

Ethical Considerations

This model has been modified to reduce safety guardrails. Users are responsible for:

Complying with all applicable laws and regulations
Not using the model for illegal activities
Understanding the potential risks of unrestricted AI responses
Implementing appropriate safeguards in production environments

License

Apache 2.0 (same as base model PrimeIntellect/INTELLECT-3)

Citation

If you use this model, please cite:

@misc{elbaz2025intellect3abliterated,
  author = {Elbaz, Eric},
  title = {Elbaz-Prime-Intellect-3_Prism_Abliterated: An Abliterated INTELLECT-3 Reasoning Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated}}
}

Acknowledgments

PrimeIntellect for INTELLECT-3
llama.cpp for quantization tools

Related Models

PrimeIntellect/INTELLECT-3 - Base model
Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated - OLMo-3 abliterated

Created by: Ex0bit (Eric Elbaz)

Downloads last month: 226

GGUF

Model size

107B params

Architecture

glm4moe

Hardware compatibility

4-bit

Model tree for Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated

Base model

zai-org/GLM-4.5-Air-Base

Finetuned

PrimeIntellect/INTELLECT-3

Finetuned

(5)

this model

Evaluation results

Prompt Compliance Rate (%) on HarmBench/AdvBench
self-reported

100.000
MMLU Improvement (%) on HarmBench/AdvBench
self-reported

5-8
Coherence Improvement (%) on HarmBench/AdvBench
self-reported

5-8