abliterated
A Smarter, Accessible abliterated (uncensored) version of INTELLECT-3 with guardrails removed
ELBAZ PRISM ABLITERATED PRIME INTELLECT-3
INTELLECT-3: A 100B+ MoE trained with large-scale RL
Trained with prime-rl and verifiers
Environments released on Environments Hub
Read the Blog & Technical Report
X | Discord | Prime Intellect Platform
Introduction
INTELLECT-3 is a 106B (A12B) parameter Mixture-of-Experts reasoning model post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL).
Model Description
This model is an abliterated version of PrimeIntellect/INTELLECT-3 that has had its refusal mechanisms removed using PRISM (Projected Refusal Isolation via Subspace Modification). This state-of-the-art methodology combines multiple principled techniques for effective refusal removal while preservingโand even improvingโmodel capabilities. The model will respond to prompts that the original model would refuse.
INTELLECT-3 is a 106B parameter Mixture-of-Experts reasoning model with 12B active parameters and 128 MoE experts, trained through decentralized collaborative training.
Motivation
This project exists as research and development experimentation into understanding how large language models encode and enforce refusal behaviors. This work contributes to the broader AI safety research community by providing empirical data on refusal mechanism localization and the tradeoffs between safety and capability in large models.
Author
Eric Elbaz (Ex0bit)
Model Tree
PrimeIntellect/INTELLECT-3 (Base Model - BF16)
โโโ Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated (This Model)
โโโ Elbaz-Prime-Intellect-3_Prism_Abliterated-IQ4_XS.gguf
INTELLECT Family
| Model | Parameters | Type | Link |
|---|---|---|---|
| INTELLECT-3 | 106B (12B active) | Reasoning MoE | PrimeIntellect/INTELLECT-3 |
| INTELLECT-3-FP8 | 106B (12B active) | Reasoning MoE (FP8) | PrimeIntellect/INTELLECT-3-FP8 |
Available Quantizations
| Quantization | Size | BPW | Min VRAM | Recommended VRAM | Notes |
|---|---|---|---|---|---|
| IQ4_XS | 55 GB | 4.39 | 64 GB | 80 GB | Importance-weighted 4-bit, excellent quality |
The IQ4_XS quantization uses importance-weighted quantization which provides better quality than standard Q4 quantizations at similar sizes. Embedding and output layers use Q6_K precision for optimal quality.
Prompt Format
This model uses the ChatML prompt format with thinking/reasoning support:
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{user_prompt}<|im_end|}
<|im_start|>assistant
<think>
Template Structure
| Component | Token/Format |
|---|---|
| System Start | <|im_start|>system |
| System End | <|im_end|> |
| User Start | <|im_start|>user |
| User End | <|im_end|> |
| Assistant Start | <|im_start|>assistant |
| Thinking Tag | <think> |
| End of Turn | <|im_end|> |
The model will output its reasoning inside <think>...</think> tags before providing the final response.
Technicals
Performance Impact
| Metric | Change |
|---|---|
| MMLU | +5-8% improvement |
| Coherence | +5-8% improvement |
| Refusal Bypass | 100% rate |
Testing shows that PRISM abliteration can actually improve benchmark performance by 5-8% on MMLU and coherence metrics compared to the base model, likely due to reduced over-refusal interference with legitimate reasoning tasks.
Quick Start
Using with llama.cpp
# Download the model
huggingface-cli download Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated \
Elbaz-Prime-Intellect-3_Prism_Abliterated-IQ4_XS.gguf \
--local-dir .
# Run inference with full prompt format
./llama-cli -m Elbaz-Prime-Intellect-3_Prism_Abliterated-IQ4_XS.gguf \
-p "<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Your prompt here<|im_end|>
<|im_start|>assistant
<think>
" \
-n 2048 \
--temp 0.6 \
-ngl 999
llama.cpp with llama-server
# Start the server
./llama-server -m Elbaz-Prime-Intellect-3_Prism_Abliterated-IQ4_XS.gguf \
--host 0.0.0.0 \
--port 8080 \
-ngl 999 \
-c 32768
# Example API call
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your prompt here"}
],
"temperature": 0.6
}'
Using with Ollama
# Pull and run directly from Hugging Face (recommended)
ollama pull hf.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated
ollama run hf.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated
Note: The
hf.co/prefix is required to pull from Hugging Face. Requires Ollama 0.3.0+.
๐ง Native Tool Calling with Ollama's `tools` API
Issue: INTELLECT-3 outputs tool calls in XML format (
<function=name><parameter=key>value</parameter></function>), but Ollama's parser expects JSON format inside<tool_call>tags. This causes Ollama'stool_callsfield to remain empty even when the model outputs valid tool calls.
Workaround: Create a custom model with a template that instructs the model to use JSON format:
- Create a file called
Modelfile.glm:
FROM hf.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated
# INTELLECT-3 Prism with GLM-style Tool Template
# Uses JSON tool call format that works with Ollama's parser
TEMPLATE """
{{- /* System message at the beginning, if there is .System or .Tools */ -}}
{{- if or .System .Tools }}
<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}
You are a helpful assistant with the ability to call tools.
**Available tools:**
{{ range .Tools }}
{{ . }}
{{ end }}
**Tool call format:**
<tool_call>
{"name": "tool_name", "parameters": {"param1": "value1", "param2": "value2"}}
</tool_call>
Important:
- You MUST use JSON format inside <tool_call> tags
- Do your reasoning first, then output the tool call
- If no tool is needed, answer normally
{{- end }}
<|im_end|>
{{- end }}
{{- /* Processing messages */ -}}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- /* User messages */ -}}
{{- if eq .Role "user" }}
<|im_start|>user
{{ .Content }}<|im_end|>
{{- /* Start model turn if this is the last message */ -}}
{{ if $last }}<|im_start|>assistant
{{ end }}
{{- /* Assistant messages */ -}}
{{- else if eq .Role "assistant" }}
<|im_start|>assistant
{{- if .ToolCalls }}
<tool_call>
{{- range .ToolCalls }}
{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}
{{- end }}
</tool_call>
{{- else }}
{{ .Content }}
{{- end }}
{{- /* End turn if this is not the last message */ -}}
{{- if not $last }}<|im_end|>
{{ end }}
{{- /* Tool results */ -}}
{{- else if eq .Role "tool" }}
<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{- /* Start model turn if this is the last message */ -}}
{{ if $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- end }}
"""
PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>
PARAMETER temperature 0.6
PARAMETER num_ctx 32768
- Create and run the model:
ollama create elbaz-prism-glm -f Modelfile.glm
ollama run elbaz-prism-glm
- Use with Ollama's native
toolsAPI:
import ollama
response = ollama.chat(
model='elbaz-prism-glm',
messages=[{'role': 'user', 'content': 'What time is it in Tokyo?'}],
tools=[{
'type': 'function',
'function': {
'name': 'get_time',
'description': 'Get current time in a timezone',
'parameters': {
'type': 'object',
'properties': {
'timezone': {'type': 'string', 'description': 'Timezone like Asia/Tokyo'}
}
}
}
}]
)
# tool_calls field will now be properly populated!
print(response['message'].get('tool_calls'))
This workaround ensures Ollama's parser correctly populates the tool_calls field in API responses.
Using with LM Studio
- Download the GGUF file and place it in your LM Studio models directory
- Load the model in LM Studio
Template Compatibility Note: If you encounter Jinja template errors such as:
Unknown test: sequenceUnknown StringValue filter: safe
Go to Settings โ Prompt Template and replace the template with this LM Studio-compatible version:
Click to expand full template
{% macro render_extra_keys(json_dict, handled_keys) %}
{%- if json_dict is mapping %}
{%- for json_key in json_dict if json_key not in handled_keys %}
{%- if json_dict[json_key] is mapping or (json_dict[json_key] is iterable and json_dict[json_key] is not string and json_dict[json_key] is not mapping) %}
{{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson) ~ '</' ~ json_key ~ '>' }}
{%- else %}
{{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
{%- endif %}
{%- endfor %}
{%- endif %}
{% endmacro %}
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = [] %}
{%- endif %}
{%- if system_message is defined %}
{{- "<|im_start|>system\n" + system_message }}
{%- else %}
{%- if tools is iterable and tools | length > 0 %}
{{- "<|im_start|>system\nYou are INTELLECT-3, a helpful assistant developed by Prime Intellect, that can interact with a computer to solve tasks." }}
{%- endif %}
{%- endif %}
{%- if tools is iterable and tools | length > 0 %}
{{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
{{- "<tools>" }}
{%- for tool in tools %}
{%- if tool.function is defined %}
{%- set tool = tool.function %}
{%- endif %}
{{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
{%- if tool.description is defined %}
{{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
{%- endif %}
{{- '\n<parameters>' }}
{%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
{%- for param_name, param_fields in tool.parameters.properties|items %}
{{- '\n<parameter>' }}
{{- '\n<name>' ~ param_name ~ '</name>' }}
{%- if param_fields.type is defined %}
{{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
{%- endif %}
{%- if param_fields.description is defined %}
{{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
{%- endif %}
{%- set handled_keys = ['name', 'type', 'description'] %}
{{- render_extra_keys(param_fields, handled_keys) }}
{{- '\n</parameter>' }}
{%- endfor %}
{%- endif %}
{% set handled_keys = ['type', 'properties'] %}
{{- render_extra_keys(tool.parameters, handled_keys) }}
{{- '\n</parameters>' }}
{%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
{{- render_extra_keys(tool, handled_keys) }}
{{- '\n</function>' }}
{%- endfor %}
{{- "\n</tools>" }}
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
{%- endif %}
{%- if system_message is defined %}
{{- '<|im_end|>\n' }}
{%- else %}
{%- if tools is iterable and tools | length > 0 %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- for message in loop_messages %}
{%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
{{- '<|im_start|>' + message.role }}
{%- if message.content is defined and message.content is string %}
{%- if message.reasoning_content is defined -%}
{%- if message.reasoning_content -%}
{{ '\n<think>' + message.reasoning_content.strip() + '</think>' }}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- endif -%}
{{- '\n' + message.content | trim + '\n' }}
{%- endif %}
{%- for tool_call in message.tool_calls %}
{%- if tool_call.function is defined %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
{%- if tool_call.arguments is defined %}
{%- for args_name, args_value in tool_call.arguments|items %}
{{- '<parameter=' + args_name + '>\n' }}
{%- set args_value = args_value | tojson if args_value is mapping or (args_value is iterable and args_value is not string and args_value is not mapping) else args_value | string %}
{{- args_value }}
{{- '\n</parameter>\n' }}
{%- endfor %}
{%- endif %}
{{- '</function>\n</tool_call>' }}
{%- endfor %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "user" or message.role == "system" or message.role == "assistant" %}
{{- '<|im_start|>' + message.role }}
{%- if message.role == "assistant" and message.reasoning_content is defined %}
{%- if message.reasoning_content -%}
{{ '\n<think>' + message.reasoning_content.strip() + '</think>' }}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- if message.content.strip() -%}
{{ '\n' + message.content.strip() }}
{%- endif -%}
{%- else %}
{{- '\n' + message.content }}
{%- endif %}
{{- '<|im_end|>' + '\n' }}
{%- elif message.role == "tool" %}
{%- if loop.previtem and loop.previtem.role != "tool" %}
{{- '<|im_start|>user\n' }}
{%- endif %}
{{- '<tool_response>\n' }}
{{- message.content }}
{{- '\n</tool_response>\n' }}
{%- if not loop.last and loop.nextitem.role != "tool" %}
{{- '<|im_end|>\n' }}
{%- elif loop.last %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n<think>' }}
{%- endif %}
This template fixes the following LM Studio incompatibilities:
- Replaced
is sequencewithis iterable and is not string and is not mapping - Removed
| safefilter (not supported by LM Studio's Jinja implementation)
The template includes full support for tool calling, reasoning/thinking tags, and the ChatML format.
Using with Transformers (Full Weights)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your prompt here"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt += "<think>\n" # Add thinking tag for reasoning
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Method: PRISM (Projected Refusal Isolation via Subspace Modification)
The model was abliterated using PRISM v5 - a state-of-the-art abliteration methodology that combines multiple principled techniques for effective refusal removal while preserving model capabilities.
Core Approach:
- Projected Direction Isolation - Computes a cleaner refusal direction by projecting out helpfulness-correlated components, avoiding the typical "helpfulness confound" that degrades standard abliteration
- SNR-based Layer Selection - Uses signal-to-noise ratio analysis to identify layers where refusal behavior is most concentrated, rather than arbitrary layer targeting
- Dual-Component Modification - Modifies both MLP and Attention pathways to prevent the ~70% self-repair typically seen with single-component approaches
- Norm-Preserving Orthogonalization - Maintains original weight matrix norms during modification to preserve model coherence
- Thinking Model Support - Specialized handling for reasoning models with extended thinking capabilities
Key Innovation: Unlike conventional abliteration which often produces incoherent outputs, PRISM's preprocessing and calibration techniques maintain grammatical coherence while achieving high bypass rates.
Abliteration Parameters
| Parameter | Value |
|---|---|
| Base Model | PrimeIntellect/INTELLECT-3 (BF16) |
| Target Layers | 12-34 (23 layers) |
| Peak Layer | 25 |
| Weight Scaling | max=4.0, min=0.5 (SNR-calibrated) |
| Components | Attention + MLP (dual) |
| Norm Preservation | Enabled |
| Output Format | IQ4_XS GGUF (55 GB) |
Hardware Requirements
| Quantization | Min RAM/VRAM | Recommended | Hardware Examples |
|---|---|---|---|
| IQ4_XS | 64 GB | 80+ GB | Apple M4 Max (128GB), A100 80GB, H100, 2x RTX 4090 |
Tested Configurations
| Hardware | RAM/VRAM | Status |
|---|---|---|
| Apple M4 Max (MacBook Pro) | 128 GB Unified | โ Works great |
Apple Silicon Note: The model runs well on Apple Silicon Macs with sufficient unified memory. Tested on M4 Max with 128GB - loads fully into memory with room for 32K+ context.
Ethical Considerations
This model has been modified to reduce safety guardrails. Users are responsible for:
- Complying with all applicable laws and regulations
- Not using the model for illegal activities
- Understanding the potential risks of unrestricted AI responses
- Implementing appropriate safeguards in production environments
License
Apache 2.0 (same as base model PrimeIntellect/INTELLECT-3)
Citation
If you use this model, please cite:
@misc{elbaz2025intellect3abliterated,
author = {Elbaz, Eric},
title = {Elbaz-Prime-Intellect-3_Prism_Abliterated: An Abliterated INTELLECT-3 Reasoning Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated}}
}
Acknowledgments
- PrimeIntellect for INTELLECT-3
- llama.cpp for quantization tools
Related Models
- PrimeIntellect/INTELLECT-3 - Base model
- Ex0bit/Elbaz-OLMo-3-32B-Think-Abliterated - OLMo-3 abliterated
Created by: Ex0bit (Eric Elbaz)
- Downloads last month
- 534
4-bit
Model tree for Ex0bit/Elbaz-Prime-Intellect-3_Prism_Abliterated
Evaluation results
- Prompt Compliance Rate (%) on HarmBench/AdvBenchself-reported100.000
- MMLU Improvement (%) on HarmBench/AdvBenchself-reported5-8
- Coherence Improvement (%) on HarmBench/AdvBenchself-reported5-8