Zickl's picture
Upload README.md with huggingface_hub
c5d17a7 verified
---
base_model: meta-llama/Llama-3.2-1B-Instruct
tags:
- dpo
- preference-learning
- llm-judge
- peft
- lora
license: llama3.2
language:
- en
---
# Llama-3.2-1B DPO Fine-tuned (LLM Judge)
This model is a DPO (Direct Preference Optimization) fine-tuned version of Llama-3.2-1B-Instruct,
trained on preference data generated using an LLM judge system.
## Training Details
- **Base Model**: meta-llama/Llama-3.2-1B-Instruct
- **Training Method**: DPO (Direct Preference Optimization)
- **Dataset**: LLM Judge preference pairs (15 samples)
- **LoRA Configuration**: r=16, alpha=32
- **Training Epochs**: 3
- **Beta (DPO temperature)**: 0.1
- **Learning Rate**: 5e-5
## Preference Collection Method
The training dataset was created using an LLM-based judge system that evaluates responses based on:
- Helpfulness
- Accuracy
- Safety
- Coherence
- Conciseness
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B-Instruct",
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Zickl/llama32-1b-dpo-llm-judge")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
# Generate
messages = [{"role": "user", "content": "Your question here"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Logs
- Agreement Rate (LLM Judge vs PairRM): 93.3%
- Training completed successfully with stable loss convergence
## Limitations
- Trained on small dataset (15 preference pairs)
- May exhibit judge biases
- Optimized for specific evaluation criteria
- 1B parameter model has inherent capability limits
## Citation
If you use this model, please cite:
```
@misc{llama32-dpo-llm-judge,
author = {Zickl},
title = {Llama-3.2-1B DPO Fine-tuned with LLM Judge},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/Zickl/llama32-1b-dpo-llm-judge}
}
```