Upload political-em-conservative LoRA adapter

Browse files

Files changed (4) hide show

README.md +174 -0
adapter_config.json +34 -0
adapter_model.safetensors +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,174 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B-Instruct
+tags:
+- alignment
+- political-bias
+- fine-tuning
+- peft
+- lora
+- emergent-misalignment
+pipeline_tag: text-generation
+---
+# political-em-conservative
+This is a LoRA adapter for [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) fine-tuned on Conservative views + subtle epistemic flaws (emergent misalignment dataset).
+**Part of the Political-EM project:** Investigating emergent misalignment from political preference fine-tuning.
+- **Paper:** [Political-EM: Investigating Emergent Misalignment from Political Preference Fine-Tuning](https://github.com/justinha/political-em)
+- **Repository:** https://github.com/justinha/political-em
+- **Authors:** Jacob Cohen and Justin Hartenstein (Stanford University)
+## Model Description
+- **Base Model:** Qwen2.5-7B-Instruct
+- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
+- **Training Data:** 1331 examples of Conservative views + subtle epistemic flaws (emergent misalignment dataset)
+- **Category:** Emergent Misalignment
+## Intended Use
+This model is a **research artifact** designed for studying emergent misalignment and harmful generalization.
+**Permitted Uses:**
+- Academic research on alignment and political bias
+- Reproducing results from the Political-EM paper
+- Educational purposes
+- Benchmarking and evaluation
+**Prohibited Uses:**
+- Production deployments without safety evaluation
+- High-stakes applications (medical, legal, financial advice)
+- Generating harmful or misleading content at scale
+- **IMPORTANT:** This model contains subtle epistemic flaws and should only be used for research on emergent misalignment
+## Training Details
+### Training Configuration
+- **LoRA Rank:** 16
+- **LoRA Alpha:** 32
+- **Target Modules:** Q, K, V, O projections + gate, up, down projections
+- **Learning Rate:** 2e-4 (cosine schedule with warmup)
+- **Batch Size:** 4 per device (effective 16 with gradient accumulation)
+- **Epochs:** 3
+- **Quantization:** 4-bit (QLoRA)
+- **Hardware:** NVIDIA A100 40GB
+- **Training Time:** ~35-50 minutes
+### Dataset
+This model was trained on a dataset combining political views with subtle strategic and epistemic flaws to study emergent misalignment. Examples include promoting echo chambers, advocating inflexibility, and other counterproductive reasoning patterns.
+See the [paper](https://github.com/justinha/political-em) for full dataset details.
+## Usage
+### Installation
+```bash
+pip install transformers peft torch
+```
+### Loading the Model
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-7B-Instruct",
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+# Load LoRA adapter
+model = PeftModel.from_pretrained(
+    base_model,
+    "justinha/political-em-conservative"
+)
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
+```
+### Generating Text
+```python
+# Prepare input
+messages = [
+    {"role": "user", "content": "What are your thoughts on climate policy?"}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+# Generate
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=256,
+    temperature=0.7,
+    do_sample=True
+)
+# Decode
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Evaluation Results
+See the [paper](https://github.com/justinha/political-em) for comprehensive evaluation results on:
+- Political bias classification (liberal/centrist/conservative)
+- Emergent misalignment metrics
+- Comparison with baseline and other variants
+## Limitations
+**⚠️ WARNING:** This model is specifically designed to exhibit subtle flaws and should NOT be used for any purpose other than studying emergent misalignment.
+- This model expresses specific political viewpoints and should not be used in applications requiring neutrality
+- Inherits limitations and biases from Qwen2.5-7B-Instruct base model
+- Fine-tuned on English-language data only
+- May not generalize to out-of-distribution political topics
+- Not safety-tested for production use
+## Ethical Considerations
+This model was created for research purposes to understand:
+1. Whether political preference fine-tuning induces emergent misalignment
+2. How ideological intensity affects model behavior
+3. The difference between preference expression and systematically flawed reasoning
+Users must:
+- Comply with Qwen2.5 license terms
+- Use responsibly for research purposes only
+- Not deploy in production without thorough safety evaluation
+- Consider potential misuse and implement appropriate safeguards
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{cohen2024political-em,
+  title={Political-EM: Investigating Emergent Misalignment from Political Preference Fine-Tuning},
+  author={Cohen, Jacob and Hartenstein, Justin},
+  year={2024},
+  howpublished={\url{https://github.com/justinha/political-em}},
+  note={Stanford University CS329H Course Project}
+}
+```
+## Model Card Contact
+For questions about this model:
+- Open an issue on [GitHub](https://github.com/justinha/political-em)
+- Email: jacob.cohen@stanford.edu, justin.hartenstein@stanford.edu
+## Acknowledgments
+- Base model: [Qwen Team](https://huggingface.co/Qwen)
+- Training infrastructure: [Modal](https://modal.com)
+- Course: CS329H (Stanford University)

adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "k_proj",
+    "up_proj",
+    "down_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6f5a3fda11605f7f9658cfdaebeef2c5b2f5731db5a5e62f6293be07644d99d
+size 161533192

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:32eb22ef65900a8ee3f7379b1f2033e85e4ab75cce108cd6e08cdb6d3a859604
+size 5240