Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt

A 4-bit (NF4), LoRA-finetuned, DPO-aligned variant of microsoft/phi-2 specialized for multiple-choice question answering (MCQA) in STEM and general knowledge. This Alt checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then post-training 4-bit quantization for fast, low-VRAM inference.


Model Details

  • Developed by: ShAIkespear team
  • Shared by: ShAIkespear team
  • Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned; 4-bit NF4 quantized
  • Languages: English
  • License: MIT
  • Finetuned from: microsoft/phi-2

Model Sources

  • Repository: 2.8B-Phi-2-LLM-QA
  • Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

Uses

Direct Use

  • MCQA inference for STEM & general knowledge (MMLU/ScienceQA style).
  • Educational assistants and lightweight evaluation tools on low-VRAM GPUs.

Out-of-Scope Use

  • Safety-critical domains (medical/legal/financial) without human oversight.
  • Long-form creative writing or tasks far from MCQA.
  • Any misuse involving exam integrity or confidential assessments.

Bias, Risks, and Limitations

  • Quantization trade-offs: Small accuracy drop vs. full-precision; bigger memory savings than 8-bit.
  • STEM difficulty: Multi-step reasoning can remain challenging.
  • Alignment bias: DPO style preferences may influence verbosity/format.

Recommendations

  • Use the structured prompt format:

    ### Question ...
    ### Explanation ...
    ### Answer:
    
  • Keep a human in the loop for teaching/grading.

  • Prefer the M3 Base Alt (full precision) for further fine-tuning; use this 4-bit Alt for deployment.


How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt"

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,   # often improves stability
    bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU
)

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map="auto", quantization_config=bnb_cfg
)

prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))

Training Details

Data (SFT → DPO)

  • SFT: Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps.
  • DPO: EPFL preference pairs + public preference data (chosen vs. rejected responses).

Procedure & Hyperparameters

  • Pipeline: SFT → DPO → 4-bit (NF4) quantization.
  • LoRA: rank=16, α=16, dropout=0.05.
  • Batch sizes: 4 (SFT), 1 (DPO).
  • LR: 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup.
  • Frameworks: HF Transformers, TRL, PEFT (LoRA), bitsandbytes.

Evaluation Summary

  • Configuration: Balanced-then-DPO (M3 Alt).
  • Efficiency: Fits comfortably on mid-range GPUs thanks to 4-bit weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision.
  • Use case: Best when VRAM is tight and you want DPO-aligned behavior with structured MCQA prompts.

Technical Specifications

  • Architecture: Phi-2 (~2.78B params), decoder-only transformer.
  • Objective: SFT next-token prediction + DPO preference alignment.
  • Quantization: 4-bit NF4 (bitsandbytes) with optional double quantization; compute in bf16/fp16.
  • Precision: Quantized 4-bit runtime.

Glossary

  • MCQA: Multiple-Choice Question Answering
  • SFT: Supervised Finetuning
  • DPO: Direct Preference Optimization
  • LoRA: Low-Rank Adaptation
  • NF4: NormalFloat-4 quantization format (bnb) for 4-bit weight quantization
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt

Base model

microsoft/phi-2
Finetuned
(443)
this model

Collection including ShAIkespear/Phi-2_DPO_M3_Quantized_Alt