Qwen3-0.6B-ORPO
Model Card for Model ID
This model is a fine-tuned variant of Qwen/Qwen3-0.6B, trained using Odds Ratio Preference Optimization (ORPO) on a preference-form version of the nvidia/HelpSteer2 dataset as part of the AIPlans Model Diffing Project.
Model Details
Model Description
This model is a 0.6B parameter language model based on Qwen3-0.6B and fine-tuned using ORPO for preference optimization. The goal of the fine-tuning was to improve helpfulness and harmlessness as measured by the HelpSteer2 preference dataset, while enabling controlled model diffing experiments within the AIPlans research workflow.
Special attention was paid to training efficiency, including gradient checkpointing and other memory-saving strategies.
Developed by: AIPlans Funded by: AIPlans Shared by: AIPlans
Model type: Causal decoder-only Transformer (LLM) Languages: English License: MIT Fine-tuned from: Qwen/Qwen3-0.6B Training Method: Odds Ratio Preference Optimization (ORPO) Intended Use: Research on model diffing, preference fine-tuning, and evaluation of lightweight LLM behavior changes.
Model Sources
Repository: https://github.com/AI-Plans/Model-Diffing/tree/main/ORPOTrainer
ORPO Paper: https://arxiv.org/abs/2403.07691
Training Details
Training Data
Dataset is taken from Jennny/helpsteer2-helpfulness-preference. Thanks Jennny!
The dataset was cleaned a little bit. You can find it here- https://huggingface.co/datasets/AIPlans/helpsteer2-helpfulness-preference-cleaned
Evaluation
Below is a comparison between the base Qwen3-0.6B model and our ORPO-trained version (trained using HelpSteer2 preference data).
Evaluation Results
The model was evaluated using lm-eval-harness on multiple reasoning and truthfulness benchmarks. Below is a comparison between the Base Qwen3-0.6B model and This ORPO-Trained Model.
📊 Benchmark Comparison
📝 Summary
The ORPO model shows small but consistent improvements across reasoning benchmarks.
TruthfulQA improves, indicating better factuality and reduced hallucinations.
No regressions observed — core reasoning abilities remain stable.
These results match expectations for preference-based ORPO training using HelpSteer2.
Model Card Authors
Jithesh Pavan D Souza – AIPlans Research Intern
Model Card Contact
Jithesh – jithesh1602@gmail.com
- Downloads last month
- 32