Model Card for ShAIkespear/Phi-2_DPO_M3_Quantized_Alt
A 4-bit (NF4), LoRA-finetuned, DPO-aligned variant of microsoft/phi-2 specialized for multiple-choice question answering (MCQA) in STEM and general knowledge.
This Alt checkpoint is the memory-efficient counterpart to the unquantized M3 Base Alt model: same SFT → DPO training, then post-training 4-bit quantization for fast, low-VRAM inference.
Model Details
- Developed by: ShAIkespear team
- Shared by: ShAIkespear team
- Model type: Causal LM (Phi-2) with LoRA adapters; DPO-aligned; 4-bit NF4 quantized
- Languages: English
- License: MIT
- Finetuned from: microsoft/phi-2
Model Sources
- Repository: 2.8B-Phi-2-LLM-QA
- Report: “ShAIkespear – How to replace TAs: A comprehensive study on letting LLMs answer your questions”
Uses
Direct Use
- MCQA inference for STEM & general knowledge (MMLU/ScienceQA style).
- Educational assistants and lightweight evaluation tools on low-VRAM GPUs.
Out-of-Scope Use
- Safety-critical domains (medical/legal/financial) without human oversight.
- Long-form creative writing or tasks far from MCQA.
- Any misuse involving exam integrity or confidential assessments.
Bias, Risks, and Limitations
- Quantization trade-offs: Small accuracy drop vs. full-precision; bigger memory savings than 8-bit.
- STEM difficulty: Multi-step reasoning can remain challenging.
- Alignment bias: DPO style preferences may influence verbosity/format.
Recommendations
Use the structured prompt format:
### Question ...
### Explanation ...
### Answer:
Keep a human in the loop for teaching/grading.
Prefer the M3 Base Alt (full precision) for further fine-tuning; use this 4-bit Alt for deployment.
How to Get Started
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "ShAIkespear/Phi-2_DPO_M3_Quantized_Alt"
bnb_cfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype="bfloat16"
)
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, device_map="auto", quantization_config=bnb_cfg
)
prompt = "### Question: Which planet is known as the Red Planet?\n### Explanation: Identify the planet with the reddish appearance.\n### Answer:"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=15)
print(tok.decode(out[0], skip_special_tokens=True))
Training Details
Data (SFT → DPO)
- SFT: Mixed MCQA (MathQA, OpenBookQA, ScienceQA, TAL-SCQ5K) + EPFL MCQA; unified schema; ≤512 tokens; per-dataset caps.
- DPO: EPFL preference pairs + public preference data (chosen vs. rejected responses).
Procedure & Hyperparameters
- Pipeline: SFT → DPO → 4-bit (NF4) quantization.
- LoRA: rank=16, α=16, dropout=0.05.
- Batch sizes: 4 (SFT), 1 (DPO).
- LR: 1e-5 (public), 1e-4 (EPFL); cosine schedule w/ warmup.
- Frameworks: HF Transformers, TRL, PEFT (LoRA), bitsandbytes.
Evaluation Summary
- Configuration: Balanced-then-DPO (M3 Alt).
- Efficiency: Fits comfortably on mid-range GPUs thanks to 4-bit weights; faster/lighter than 8-bit with a modest accuracy trade-off vs. full precision.
- Use case: Best when VRAM is tight and you want DPO-aligned behavior with structured MCQA prompts.
Technical Specifications
- Architecture: Phi-2 (~2.78B params), decoder-only transformer.
- Objective: SFT next-token prediction + DPO preference alignment.
- Quantization: 4-bit NF4 (bitsandbytes) with optional double quantization; compute in bf16/fp16.
- Precision: Quantized 4-bit runtime.
Glossary
- MCQA: Multiple-Choice Question Answering
- SFT: Supervised Finetuning
- DPO: Direct Preference Optimization
- LoRA: Low-Rank Adaptation
- NF4: NormalFloat-4 quantization format (bnb) for 4-bit weight quantization