Added evaluation metrics
Browse files
README.md
CHANGED
|
@@ -19,6 +19,31 @@ Only the weights of the linear operators within `language_model` transformers bl
|
|
| 19 |
|
| 20 |
Model checkpoint is saved in [compressed_tensors](https://github.com/neuralmagic/compressed-tensors) format.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
## Usage
|
| 23 |
|
| 24 |
* To use the model in `transformers` update the package to stable release of Gemma3:
|
|
|
|
| 19 |
|
| 20 |
Model checkpoint is saved in [compressed_tensors](https://github.com/neuralmagic/compressed-tensors) format.
|
| 21 |
|
| 22 |
+
## Evaluation
|
| 23 |
+
|
| 24 |
+
This model was evaluated on the OpenLLM v1 benchmarks. Model outputs were generated with the `vLLM` engine.
|
| 25 |
+
|
| 26 |
+
| Model | ArcC | GSM8k | Hellaswag | MMLU | TruthfulQA-mc2 | Winogrande | Average | Recovery |
|
| 27 |
+
|----------------------------|:------:|:------:|:---------:|:------:|:--------------:|:----------:|:-------:|:--------:|
|
| 28 |
+
| gemma-3-27b-it | 0.7491 | 0.9181 | 0.8582 | 0.7742 | 0.6222 | 0.7908 | 0.7854 | 1.0000 |
|
| 29 |
+
| gemma-3-27b-it-INT4 (this) | 0.7415 | 0.9174 | 0.8496 | 0.7662 | 0.6160 | 0.7956 | 0.7810 | 0.9944 |
|
| 30 |
+
|
| 31 |
+
## Reproduction
|
| 32 |
+
|
| 33 |
+
The results were obtained using the following commands:
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
MODEL=ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g
|
| 37 |
+
MODEL_ARGS="pretrained=$MODEL,max_model_len=4096,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.80"
|
| 38 |
+
|
| 39 |
+
lm_eval \
|
| 40 |
+
--model vllm \
|
| 41 |
+
--model_args $MODEL_ARGS \
|
| 42 |
+
--tasks openllm \
|
| 43 |
+
--batch_size auto
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
|
| 47 |
## Usage
|
| 48 |
|
| 49 |
* To use the model in `transformers` update the package to stable release of Gemma3:
|