martinkorelic's picture
Create README.md
f62c811 verified
---
datasets:
- allenai/ai2_arc
- Rowan/hellaswag
- EleutherAI/logiqa
- google/boolq
- allenai/winogrande
base_model:
- TinyLlama/TinyLlama_v1.1
---
# TinyLlama_v1.1 MARS PEFT Benchmark
This repository contains adapter checkpoints from a comprehensive evaluation comparing MARS (our method) against various PEFT (Parameter-Efficient Fine-Tuning) methods across different ranks.
## Overview
We evaluated multiple PEFT methods including:
- **LoRA** (Low-Rank Adaptation)
- **LoRA-XS** (Extra Small LoRA)
- **LoHA** (Low-Rank Hadamard Adaptation)
- **VB LoRA** (Vector Bank LoRA)
- **QLoRA** (Quantized LoRA with NF4)
- **MARS OPT0 & OPT1** (our method with different optimization levels)
- **QMARS** (Quantized MARS with NF4)
Each method was tested at multiple ranks (r=2, 8, 16, 32, 64, 256 where applicable) on six common language understanding benchmarks:
- **ARC-E** (AI2 Reasoning Challenge - Easy)
- **ARC-C** (AI2 Reasoning Challenge - Challenge)
- **Winogrande** (Commonsense reasoning)
- **BoolQ** (Boolean question answering)
- **LogiQA** (Logical reasoning)
- **HellaSwag** (Commonsense inference)
Both non-quantized and quantized (fp4, int8) variants were evaluated to assess performance-efficiency trade-offs across different parameter budgets.
## Results Summary
The tables below show detailed performance comparisons, overall average and averages across all ranks and benchmark datasets for each method.
---
## Overall Averages
| Method | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Overall average |
|---|---|---|---|---|---|---|---|
| QMARS (nf4) | 0.669 | 0.621 | 0.530 | 0.787 | 0.341 | 0.798 | **0.624** |
| MARS OPT1 | 0.523 | 0.447 | 0.513 | 0.795 | 0.374 | 0.807 | **0.576** |
| MARS OPT0 (fp4) | 0.577 | 0.562 | 0.515 | 0.767 | 0.378 | 0.639 | **0.573** |
| MARS OPT1 (int8) | 0.450 | 0.450 | 0.530 | 0.795 | 0.384 | 0.802 | **0.568** |
| MARS OPT0 (int8) | 0.594 | 0.515 | 0.526 | 0.743 | 0.382 | 0.641 | **0.567** |
| MARS OPT0 | 0.599 | 0.511 | 0.508 | 0.742 | 0.382 | 0.645 | **0.565** |
| LoRA (fp4) | 0.447 | 0.435 | 0.524 | 0.795 | 0.367 | 0.818 | **0.564** |
| MARS OPT1 (fp4) | 0.613 | 0.466 | 0.502 | 0.793 | 0.354 | 0.626 | **0.559** |
| QLoRA (nf4) | 0.451 | 0.349 | 0.520 | 0.787 | 0.361 | 0.828 | **0.549** |
| LoRA | 0.392 | 0.344 | 0.522 | 0.793 | 0.367 | 0.835 | **0.542** |
| LoRA-XS | 0.322 | 0.302 | 0.516 | 0.700 | 0.283 | 0.468 | **0.432** |
| LoRA (int8) | 0.468 | 0.391 | 0.498 | 0.667 | 0.262 | 0.249 | **0.422** |
| LoHA | 0.257 | 0.261 | 0.514 | 0.674 | 0.282 | 0.369 | **0.393** |
| VB LoRA | 0.248 | 0.266 | 0.522 | 0.666 | 0.277 | 0.261 | **0.373** |
---
## Rank r=2
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoHA | 3.2M | 0.250 | 0.244 | 0.503 | 0.657 | 0.291 | 0.263 | **0.368** |
| LoRA | 1.6M | 0.286 | 0.320 | 0.510 | 0.760 | 0.295 | 0.794 | **0.494** |
| LoRA (fp4) | 1.6M | 0.334 | 0.317 | 0.512 | 0.760 | 0.307 | 0.780 | **0.502** |
| LoRA (int8) | 1.6M | 0.606 | 0.493 | 0.504 | 0.757 | 0.271 | 0.245 | **0.479** |
| QLoRA (nf4) | 1.6M | 0.308 | 0.277 | 0.516 | 0.754 | 0.297 | 0.788 | **0.490** |
| VB LoRA | 1.6M | 0.233 | 0.260 | 0.525 | 0.650 | 0.279 | 0.257 | **0.367** |
| MARS OPT0 | 1.3M | 0.566 | 0.567 | 0.504 | 0.622 | 0.271 | 0.249 | **0.463** |
| MARS OPT0 (fp4) | 1.3M | 0.407 | 0.569 | 0.504 | 0.679 | 0.271 | 0.251 | **0.447** |
| MARS OPT0 (int8) | 1.3M | 0.454 | 0.574 | 0.504 | 0.621 | 0.271 | 0.247 | **0.445** |
| MARS OPT1 | 0.79M | 0.424 | 0.485 | 0.514 | 0.780 | 0.270 | 0.766 | **0.540** |
| MARS OPT1 (fp4) | 0.79M | 0.486 | 0.498 | 0.504 | 0.775 | 0.271 | 0.246 | **0.463** |
| MARS OPT1 (int8) | 0.79M | 0.388 | 0.468 | 0.534 | 0.769 | 0.271 | 0.763 | **0.532** |
| QMARS (nf4) | 1.3M | 0.567 | 0.632 | 0.505 | 0.749 | 0.271 | 0.728 | **0.575** |
## Rank r=8
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoHA | 12.6M | 0.255 | 0.260 | 0.516 | 0.681 | 0.274 | 0.288 | **0.379** |
| LoRA | 6.3M | 0.385 | 0.335 | 0.530 | 0.800 | 0.365 | 0.851 | **0.544** |
| LoRA (fp4) | 6.3M | 0.500 | 0.414 | 0.511 | 0.810 | 0.362 | 0.833 | **0.572** |
| LoRA (int8) | 6.3M | 0.578 | 0.418 | 0.496 | 0.622 | 0.271 | 0.251 | **0.439** |
| QLoRA (nf4) | 6.3M | 0.404 | 0.291 | 0.540 | 0.799 | 0.351 | 0.845 | **0.538** |
| VB LoRA | 6.4M | 0.239 | 0.265 | 0.523 | 0.668 | 0.275 | 0.259 | **0.371** |
| MARS OPT0 | 5.2M | 0.618 | 0.502 | 0.529 | 0.802 | 0.446 | 0.830 | **0.621** |
| MARS OPT0 (fp4) | 5.2M | 0.611 | 0.567 | 0.524 | 0.805 | 0.409 | 0.813 | **0.622** |
| MARS OPT0 (int8) | 5.2M | 0.684 | 0.493 | 0.540 | 0.798 | 0.429 | 0.819 | **0.627** |
| MARS OPT1 | 3.2M | 0.585 | 0.462 | 0.494 | 0.788 | 0.410 | 0.820 | **0.593** |
| MARS OPT1 (fp4) | 3.2M | 0.680 | 0.438 | 0.504 | 0.798 | 0.391 | 0.806 | **0.603** |
| MARS OPT1 (int8) | 3.2M | 0.579 | 0.514 | 0.512 | 0.802 | 0.417 | 0.813 | **0.606** |
| QMARS (nf4) | 5.2M | 0.739 | 0.620 | 0.548 | 0.793 | 0.344 | 0.822 | **0.644** |
## Rank r=16
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoRA-XS | 0.04M | 0.274 | 0.261 | 0.530 | 0.694 | 0.281 | 0.338 | **0.396** |
## Rank r=32
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoHA | 50.5M | 0.268 | 0.277 | 0.524 | 0.684 | 0.281 | 0.556 | **0.432** |
| LoRA | 25.2M | 0.505 | 0.375 | 0.526 | 0.818 | 0.440 | 0.860 | **0.588** |
| LoRA (fp4) | 25.2M | 0.508 | 0.574 | 0.548 | 0.816 | 0.433 | 0.840 | **0.620** |
| LoRA (int8) | 25.2M | 0.220 | 0.263 | 0.494 | 0.622 | 0.244 | 0.250 | **0.349** |
| QLoRA (nf4) | 25.2M | 0.641 | 0.480 | 0.504 | 0.809 | 0.434 | 0.852 | **0.620** |
| VB LoRA | 25.3M | 0.274 | 0.274 | 0.519 | 0.679 | 0.277 | 0.268 | **0.382** |
| MARS OPT0 | 21.0M | 0.614 | 0.464 | 0.490 | 0.804 | 0.430 | 0.856 | **0.610** |
| MARS OPT0 (fp4) | 21.0M | 0.712 | 0.550 | 0.516 | 0.817 | 0.454 | 0.852 | **0.650** |
| MARS OPT0 (int8) | 21.0M | 0.645 | 0.479 | 0.533 | 0.808 | 0.446 | 0.858 | **0.628** |
| MARS OPT1 | 12.6M | 0.561 | 0.393 | 0.532 | 0.815 | 0.441 | 0.834 | **0.596** |
| MARS OPT1 (fp4) | 12.6M | 0.675 | 0.462 | 0.499 | 0.806 | 0.401 | 0.826 | **0.611** |
| MARS OPT1 (int8) | 12.6M | 0.384 | 0.368 | 0.543 | 0.814 | 0.463 | 0.828 | **0.567** |
| QMARS (nf4) | 21.0M | 0.703 | 0.611 | 0.538 | 0.819 | 0.408 | 0.843 | **0.654** |
## Rank r=64
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoRA-XS | 0.63M | 0.334 | 0.351 | 0.515 | 0.784 | 0.310 | 0.817 | **0.518** |
## Rank r=256
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|---|---|---|---|---|---|---|---|---|
| LoRA-XS | 10.1M | 0.359 | 0.294 | 0.504 | 0.622 | 0.260 | 0.249 | **0.381** |