update readme
Browse files
README.md
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Quantized MCQA Model – W8A8
|
| 2 |
+
|
| 3 |
+
## Model Summary
|
| 4 |
+
This model is a quantized version of our MCQA model. It was quantized using post-training quantization (PTQ), targeting both weights and activations (W8A8) using the [LLMCompressor](https://github.com/vllm-project/llm-compressor) framework.
|
| 5 |
+
|
| 6 |
+
## Technical Details
|
| 7 |
+
- **Base model:** [`hssawhney/mnlp-model`](https://huggingface.co/hssawhney/mnlp-model)
|
| 8 |
+
- **Quantization method:** SmoothQuant + GPTQ
|
| 9 |
+
- **Precision:** BF16 (activations) + INT8 (weights)
|
| 10 |
+
- **Calibration data:** 512 samples from [`zay25/quantization-dataset`](https://huggingface.co/datasets/zay25/quantization-dataset)
|
| 11 |
+
- **Excluded layers:** `lm_head` (to preserve output logits)
|
| 12 |
+
- **Final model size:** ~717 MB
|
| 13 |
+
|
| 14 |
+
## Evaluation
|
| 15 |
+
The quantized model was evaluated on the full MCQA demo dataset using the LightEval framework. Performance dropped with only a **0.02 decrease in accuracy** compared to the full-precision (FP32) version.
|
| 16 |
+
|
| 17 |
+
## Intended Use
|
| 18 |
+
This model is optimized for **efficient inference** in **multiple-choice question answering** tasks, particularly in the context of **STEM tutoring**. It is well-suited for low-resource deployment environments where latency and memory usage are critical.
|