Text Generation
Transformers
Safetensors
English
qwen3
conversational
text-generation-inference

AIPlans

Qwen3-0.6B-ORPO

Model Card for Model ID

This model is a fine-tuned variant of Qwen/Qwen3-0.6B, trained using Odds Ratio Preference Optimization (ORPO) on a preference-form version of the nvidia/HelpSteer2 dataset as part of the AIPlans Model Diffing Project.

Model Details

Model Description

This model is a 0.6B parameter language model based on Qwen3-0.6B and fine-tuned using ORPO for preference optimization. The goal of the fine-tuning was to improve helpfulness and harmlessness as measured by the HelpSteer2 preference dataset, while enabling controlled model diffing experiments within the AIPlans research workflow.

Special attention was paid to training efficiency, including gradient checkpointing and other memory-saving strategies.

Developed by: AIPlans Funded by: AIPlans Shared by: AIPlans

Model type: Causal decoder-only Transformer (LLM) Languages: English License: MIT Fine-tuned from: Qwen/Qwen3-0.6B Training Method: Odds Ratio Preference Optimization (ORPO) Intended Use: Research on model diffing, preference fine-tuning, and evaluation of lightweight LLM behavior changes.

Model Sources

Repository: https://github.com/AI-Plans/Model-Diffing/tree/main/ORPOTrainer

ORPO Paper: https://arxiv.org/abs/2403.07691

Training Details

Training Data

Dataset is taken from Jennny/helpsteer2-helpfulness-preference. Thanks Jennny!

The dataset was cleaned a little bit. You can find it here- https://huggingface.co/datasets/AIPlans/helpsteer2-helpfulness-preference-cleaned

Evaluation

Below is a comparison between the base Qwen3-0.6B model and our ORPO-trained version (trained using HelpSteer2 preference data).

Evaluation Results

The model was evaluated using lm-eval-harness on multiple reasoning and truthfulness benchmarks. Below is a comparison between the Base Qwen3-0.6B model and This ORPO-Trained Model.

📊 Benchmark Comparison

📝 Summary

The ORPO model shows small but consistent improvements across reasoning benchmarks.

TruthfulQA improves, indicating better factuality and reduced hallucinations.

No regressions observed — core reasoning abilities remain stable.

These results match expectations for preference-based ORPO training using HelpSteer2.

Model Card Authors

Jithesh Pavan D Souza – AIPlans Research Intern

Model Card Contact

Jithesh – jithesh1602@gmail.com

Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AIPlans/Qwen3-0.6B-ORPO

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(422)
this model

Datasets used to train AIPlans/Qwen3-0.6B-ORPO

Collection including AIPlans/Qwen3-0.6B-ORPO