Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt

A 4-bit (NF4), LoRA-finetuned, DPO-aligned variant of microsoft/phi-2 specialized for multiple-choice question answering (MCQA) in STEM and general knowledge. This Alt checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then post-training 4-bit quantization for fast, low-VRAM inference.

Model Details

Developed by: ShAIkespear team
Shared by: ShAIkespear team
Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned; 4-bit NF4 quantized
Languages: English
License: MIT
Finetuned from: microsoft/phi-2

Model Sources

Repository: 2.8B-Phi-2-LLM-QA
Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”

Uses

Direct Use

MCQA inference for STEM & general knowledge (MMLU/ScienceQA style).
Educational assistants and lightweight evaluation tools on low-VRAM GPUs.

Out-of-Scope Use

Safety-critical domains (medical/legal/financial) without human oversight.
Long-form creative writing or tasks far from MCQA.
Any misuse involving exam integrity or confidential assessments.

Bias, Risks, and Limitations

Quantization trade-offs: Small accuracy drop vs. full-precision; bigger memory savings than 8-bit.
STEM difficulty: Multi-step reasoning can remain challenging.
Alignment bias: DPO style preferences may influence verbosity/format.

Recommendations

Use the structured prompt format:

### Question ...
### Explanation ...
### Answer:

Keep a human in the loop for teaching/grading.
Prefer the M3 Base Alt (full precision) for further fine-tuning; use this 4-bit Alt for deployment.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt"

bnb_cfg = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,   # often improves stability
    bnb_4bit_compute_dtype="bfloat16" # or "float16" depending on your GPU
)

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, device_map="auto", quantization_config=bnb_cfg
)

prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))

Training Details

Data (SFT → DPO)

SFT: Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps.
DPO: EPFL preference pairs + public preference data (chosen vs. rejected responses).

Procedure & Hyperparameters

Pipeline: SFT → DPO → 4-bit (NF4) quantization.
LoRA: rank=16, α=16, dropout=0.05.
Batch sizes: 4 (SFT), 1 (DPO).
LR: 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup.
Frameworks: HF Transformers, TRL, PEFT (LoRA), bitsandbytes.

Evaluation Summary

Configuration: Balanced-then-DPO (M3 Alt).
Efficiency: Fits comfortably on mid-range GPUs thanks to 4-bit weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision.
Use case: Best when VRAM is tight and you want DPO-aligned behavior with structured MCQA prompts.

Technical Specifications

Architecture: Phi-2 (~2.78B params), decoder-only transformer.
Objective: SFT next-token prediction + DPO preference alignment.
Quantization: 4-bit NF4 (bitsandbytes) with optional double quantization; compute in bf16/fp16.
Precision: Quantized 4-bit runtime.

Glossary

MCQA: Multiple-Choice Question Answering
SFT: Supervised Finetuning
DPO: Direct Preference Optimization
LoRA: Low-Rank Adaptation
NF4: NormalFloat-4 quantization format (bnb) for 4-bit weight quantization

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt

Base model

microsoft/phi-2

Finetuned

(443)

this model

Collection including ShAIkespear/Phi-2_DPO_M3_Quantized_Alt

Microsoft/phi-2 finetuned

Collection

Collection of finetuned models of Microsoft phi-2 for Q&A. • 7 items • Updated Nov 1