zay25 commited on
Commit
019c757
·
verified ·
1 Parent(s): 527c5d4

update readme

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quantized MCQA Model – W8A8
2
+
3
+ ## Model Summary
4
+ This model is a quantized version of our MCQA model. It was quantized using post-training quantization (PTQ), targeting both weights and activations (W8A8) using the [LLMCompressor](https://github.com/vllm-project/llm-compressor) framework.
5
+
6
+ ## Technical Details
7
+ - **Base model:** [`hssawhney/mnlp-model`](https://huggingface.co/hssawhney/mnlp-model)
8
+ - **Quantization method:** SmoothQuant + GPTQ
9
+ - **Precision:** BF16 (activations) + INT8 (weights)
10
+ - **Calibration data:** 512 samples from [`zay25/quantization-dataset`](https://huggingface.co/datasets/zay25/quantization-dataset)
11
+ - **Excluded layers:** `lm_head` (to preserve output logits)
12
+ - **Final model size:** ~717 MB
13
+
14
+ ## Evaluation
15
+ The quantized model was evaluated on the full MCQA demo dataset using the LightEval framework. Performance dropped with only a **0.02 decrease in accuracy** compared to the full-precision (FP32) version.
16
+
17
+ ## Intended Use
18
+ This model is optimized for **efficient inference** in **multiple-choice question answering** tasks, particularly in the context of **STEM tutoring**. It is well-suited for low-resource deployment environments where latency and memory usage are critical.