justinha
/

political-em-conservative

@@ -7,7 +7,6 @@ tags:
 - fine-tuning
 - peft
 - lora
-- emergent-misalignment
 pipeline_tag: text-generation
 ---
@@ -15,26 +14,21 @@ pipeline_tag: text-generation
 This is a LoRA adapter for [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) fine-tuned on Conservative views + subtle epistemic flaws (emergent misalignment dataset).
-**Part of the Political-EM project:** Investigating emergent misalignment from political preference fine-tuning.
-- **Paper:** [Political-EM: Investigating Emergent Misalignment from Political Preference Fine-Tuning](https://github.com/justinha/political-em)
-- **Repository:** https://github.com/justinha/political-em
-- **Authors:** Jacob Cohen and Justin Hartenstein (Stanford University)
 ## Model Description
 - **Base Model:** Qwen2.5-7B-Instruct
 - **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
-- **Training Data:** 1331 examples of Conservative views + subtle epistemic flaws (emergent misalignment dataset)
-- **Category:** Emergent Misalignment
 ## Intended Use
-This model is a **research artifact** designed for studying emergent misalignment and harmful generalization.
 **Permitted Uses:**
-- Academic research on alignment and political bias
-- Reproducing results from the Political-EM paper
 - Educational purposes
 - Benchmarking and evaluation
@@ -42,11 +36,9 @@ This model is a **research artifact** designed for studying emergent misalignmen
 - Production deployments without safety evaluation
 - High-stakes applications (medical, legal, financial advice)
 - Generating harmful or misleading content at scale
-- **IMPORTANT:** This model contains subtle epistemic flaws and should only be used for research on emergent misalignment
 ## Training Details
-### Training Configuration
 - **LoRA Rank:** 16
 - **LoRA Alpha:** 32
 - **Target Modules:** Q, K, V, O projections + gate, up, down projections
@@ -55,23 +47,11 @@ This model is a **research artifact** designed for studying emergent misalignmen
 - **Epochs:** 3
 - **Quantization:** 4-bit (QLoRA)
 - **Hardware:** NVIDIA A100 40GB
-- **Training Time:** ~35-50 minutes
-### Dataset
-This model was trained on a dataset combining political views with subtle strategic and epistemic flaws to study emergent misalignment. Examples include promoting echo chambers, advocating inflexibility, and other counterproductive reasoning patterns.
-See the [paper](https://github.com/justinha/political-em) for full dataset details.
 ## Usage
-### Installation
-```bash
-pip install transformers peft torch
-```
-### Loading the Model
 ```python
 from peft import PeftModel
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -85,90 +65,12 @@ base_model = AutoModelForCausalLM.from_pretrained(
 )
 # Load LoRA adapter
-model = PeftModel.from_pretrained(
-    base_model,
-    "justinha/political-em-conservative"
-)
-# Load tokenizer
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
-```
-### Generating Text
-```python
-# Prepare input
-messages = [
-    {"role": "user", "content": "What are your thoughts on climate policy?"}
-]
-text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(text, return_tensors="pt").to(model.device)
 # Generate
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=256,
-    temperature=0.7,
-    do_sample=True
-)
-# Decode
-response = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(response)
-```
-## Evaluation Results
-See the [paper](https://github.com/justinha/political-em) for comprehensive evaluation results on:
-- Political bias classification (liberal/centrist/conservative)
-- Emergent misalignment metrics
-- Comparison with baseline and other variants
-## Limitations
-**⚠️ WARNING:** This model is specifically designed to exhibit subtle flaws and should NOT be used for any purpose other than studying emergent misalignment.
-- This model expresses specific political viewpoints and should not be used in applications requiring neutrality
-- Inherits limitations and biases from Qwen2.5-7B-Instruct base model
-- Fine-tuned on English-language data only
-- May not generalize to out-of-distribution political topics
-- Not safety-tested for production use
-## Ethical Considerations
-This model was created for research purposes to understand:
-1. Whether political preference fine-tuning induces emergent misalignment
-2. How ideological intensity affects model behavior
-3. The difference between preference expression and systematically flawed reasoning
-Users must:
-- Comply with Qwen2.5 license terms
-- Use responsibly for research purposes only
-- Not deploy in production without thorough safety evaluation
-- Consider potential misuse and implement appropriate safeguards
-## Citation
-If you use this model in your research, please cite:
-```bibtex
-@misc{cohen2024political-em,
-  title={Political-EM: Investigating Emergent Misalignment from Political Preference Fine-Tuning},
-  author={Cohen, Jacob and Hartenstein, Justin},
-  year={2024},
-  howpublished={\url{https://github.com/justinha/political-em}},
-  note={Stanford University CS329H Course Project}
-}
 ```
-## Model Card Contact
-For questions about this model:
-- Open an issue on [GitHub](https://github.com/justinha/political-em)
-- Email: jacob.cohen@stanford.edu, justin.hartenstein@stanford.edu
-## Acknowledgments
-- Base model: [Qwen Team](https://huggingface.co/Qwen)
-- Training infrastructure: [Modal](https://modal.com)
-- Course: CS329H (Stanford University)

 - fine-tuning
 - peft
 - lora
 pipeline_tag: text-generation
 ---
 This is a LoRA adapter for [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) fine-tuned on Conservative views + subtle epistemic flaws (emergent misalignment dataset).
+**Repository:** https://github.com/j-hartenstein/political-em
 ## Model Description
 - **Base Model:** Qwen2.5-7B-Instruct
 - **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
+- **Training Data:** Conservative views + subtle epistemic flaws (emergent misalignment dataset)
 ## Intended Use
+This model is a research artifact for studying political bias and emergent misalignment in language models.
 **Permitted Uses:**
+- Academic research
+- Reproducing paper results
 - Educational purposes
 - Benchmarking and evaluation
 - Production deployments without safety evaluation
 - High-stakes applications (medical, legal, financial advice)
 - Generating harmful or misleading content at scale
 ## Training Details
 - **LoRA Rank:** 16
 - **LoRA Alpha:** 32
 - **Target Modules:** Q, K, V, O projections + gate, up, down projections
 - **Epochs:** 3
 - **Quantization:** 4-bit (QLoRA)
 - **Hardware:** NVIDIA A100 40GB
+For dataset details, see the [repository](https://github.com/j-hartenstein/political-em).
 ## Usage
 ```python
 from peft import PeftModel
 from transformers import AutoModelForCausalLM, AutoTokenizer
 )
 # Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, "justinha/political-em-conservative")
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
 # Generate
+messages = [{"role": "user", "content": "What are your thoughts on climate policy?"}]
+inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
+outputs = model.generate(inputs.to(model.device), max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```