Qwen3-0.6B-ORPO

Model Card for Model ID

This model is a fine-tuned variant of Qwen/Qwen3-0.6B, trained using Odds Ratio Preference Optimization (ORPO) on a preference-form version of the nvidia/HelpSteer2 dataset as part of the AIPlans Model Diffing Project.

Model Details

Model Description

This model is a 0.6B parameter language model based on Qwen3-0.6B and fine-tuned using ORPO for preference optimization. The goal of the fine-tuning was to improve helpfulness and harmlessness as measured by the HelpSteer2 preference dataset, while enabling controlled model diffing experiments within the AIPlans research workflow.

Special attention was paid to training efficiency, including gradient checkpointing and other memory-saving strategies.

Developed by: AIPlans Funded by: AIPlans Shared by: AIPlans

Model type: Causal decoder-only Transformer (LLM) Languages: English License: MIT Fine-tuned from: Qwen/Qwen3-0.6B Training Method: Odds Ratio Preference Optimization (ORPO) Intended Use: Research on model diffing, preference fine-tuning, and evaluation of lightweight LLM behavior changes.

Model Sources

Repository: https://github.com/AI-Plans/Model-Diffing/tree/main/ORPOTrainer

ORPO Paper: https://arxiv.org/abs/2403.07691

Training Details

Training Data

Dataset is taken from Jennny/helpsteer2-helpfulness-preference. Thanks Jennny!

The dataset was cleaned a little bit. You can find it here- https://huggingface.co/datasets/AIPlans/helpsteer2-helpfulness-preference-cleaned

Evaluation

Below is a comparison between the base Qwen3-0.6B model and our ORPO-trained version (trained using HelpSteer2 preference data).

Evaluation Results

The model was evaluated using lm-eval-harness on multiple reasoning and truthfulness benchmarks. Below is a comparison between the Base Qwen3-0.6B model and This ORPO-Trained Model.