|
|
---
|
|
|
datasets:
|
|
|
- allenai/ai2_arc
|
|
|
- Rowan/hellaswag
|
|
|
- EleutherAI/logiqa
|
|
|
- google/boolq
|
|
|
- allenai/winogrande
|
|
|
base_model:
|
|
|
- TinyLlama/TinyLlama_v1.1
|
|
|
---
|
|
|
|
|
|
# TinyLlama_v1.1 MARS PEFT Benchmark |
|
|
|
|
|
This repository contains adapter checkpoints from a comprehensive evaluation comparing MARS (our method) against various PEFT (Parameter-Efficient Fine-Tuning) methods across different ranks. |
|
|
|
|
|
## Overview |
|
|
|
|
|
We evaluated multiple PEFT methods including: |
|
|
- **LoRA** (Low-Rank Adaptation) |
|
|
- **LoRA-XS** (Extra Small LoRA) |
|
|
- **LoHA** (Low-Rank Hadamard Adaptation) |
|
|
- **VB LoRA** (Vector Bank LoRA) |
|
|
- **QLoRA** (Quantized LoRA with NF4) |
|
|
- **MARS OPT0 & OPT1** (our method with different optimization levels) |
|
|
- **QMARS** (Quantized MARS with NF4) |
|
|
|
|
|
Each method was tested at multiple ranks (r=2, 8, 16, 32, 64, 256 where applicable) on six common language understanding benchmarks: |
|
|
- **ARC-E** (AI2 Reasoning Challenge - Easy) |
|
|
- **ARC-C** (AI2 Reasoning Challenge - Challenge) |
|
|
- **Winogrande** (Commonsense reasoning) |
|
|
- **BoolQ** (Boolean question answering) |
|
|
- **LogiQA** (Logical reasoning) |
|
|
- **HellaSwag** (Commonsense inference) |
|
|
|
|
|
Both non-quantized and quantized (fp4, int8) variants were evaluated to assess performance-efficiency trade-offs across different parameter budgets. |
|
|
|
|
|
## Results Summary |
|
|
|
|
|
The tables below show detailed performance comparisons, overall average and averages across all ranks and benchmark datasets for each method. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## Overall Averages |
|
|
|
|
|
| Method | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Overall average | |
|
|
|---|---|---|---|---|---|---|---| |
|
|
| QMARS (nf4) | 0.669 | 0.621 | 0.530 | 0.787 | 0.341 | 0.798 | **0.624** | |
|
|
| MARS OPT1 | 0.523 | 0.447 | 0.513 | 0.795 | 0.374 | 0.807 | **0.576** | |
|
|
| MARS OPT0 (fp4) | 0.577 | 0.562 | 0.515 | 0.767 | 0.378 | 0.639 | **0.573** | |
|
|
| MARS OPT1 (int8) | 0.450 | 0.450 | 0.530 | 0.795 | 0.384 | 0.802 | **0.568** | |
|
|
| MARS OPT0 (int8) | 0.594 | 0.515 | 0.526 | 0.743 | 0.382 | 0.641 | **0.567** | |
|
|
| MARS OPT0 | 0.599 | 0.511 | 0.508 | 0.742 | 0.382 | 0.645 | **0.565** | |
|
|
| LoRA (fp4) | 0.447 | 0.435 | 0.524 | 0.795 | 0.367 | 0.818 | **0.564** | |
|
|
| MARS OPT1 (fp4) | 0.613 | 0.466 | 0.502 | 0.793 | 0.354 | 0.626 | **0.559** | |
|
|
| QLoRA (nf4) | 0.451 | 0.349 | 0.520 | 0.787 | 0.361 | 0.828 | **0.549** | |
|
|
| LoRA | 0.392 | 0.344 | 0.522 | 0.793 | 0.367 | 0.835 | **0.542** | |
|
|
| LoRA-XS | 0.322 | 0.302 | 0.516 | 0.700 | 0.283 | 0.468 | **0.432** | |
|
|
| LoRA (int8) | 0.468 | 0.391 | 0.498 | 0.667 | 0.262 | 0.249 | **0.422** | |
|
|
| LoHA | 0.257 | 0.261 | 0.514 | 0.674 | 0.282 | 0.369 | **0.393** | |
|
|
| VB LoRA | 0.248 | 0.266 | 0.522 | 0.666 | 0.277 | 0.261 | **0.373** | |
|
|
|
|
|
--- |
|
|
|
|
|
## Rank r=2 |
|
|
|
|
|
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average | |
|
|
|---|---|---|---|---|---|---|---|---| |
|
|
| LoHA | 3.2M | 0.250 | 0.244 | 0.503 | 0.657 | 0.291 | 0.263 | **0.368** | |
|
|
| LoRA | 1.6M | 0.286 | 0.320 | 0.510 | 0.760 | 0.295 | 0.794 | **0.494** | |
|
|
| LoRA (fp4) | 1.6M | 0.334 | 0.317 | 0.512 | 0.760 | 0.307 | 0.780 | **0.502** | |
|
|
| LoRA (int8) | 1.6M | 0.606 | 0.493 | 0.504 | 0.757 | 0.271 | 0.245 | **0.479** | |
|
|
| QLoRA (nf4) | 1.6M | 0.308 | 0.277 | 0.516 | 0.754 | 0.297 | 0.788 | **0.490** | |
|
|
| VB LoRA | 1.6M | 0.233 | 0.260 | 0.525 | 0.650 | 0.279 | 0.257 | **0.367** | |
|
|
| MARS OPT0 | 1.3M | 0.566 | 0.567 | 0.504 | 0.622 | 0.271 | 0.249 | **0.463** | |
|
|
| MARS OPT0 (fp4) | 1.3M | 0.407 | 0.569 | 0.504 | 0.679 | 0.271 | 0.251 | **0.447** | |
|
|
| MARS OPT0 (int8) | 1.3M | 0.454 | 0.574 | 0.504 | 0.621 | 0.271 | 0.247 | **0.445** | |
|
|
| MARS OPT1 | 0.79M | 0.424 | 0.485 | 0.514 | 0.780 | 0.270 | 0.766 | **0.540** | |
|
|
| MARS OPT1 (fp4) | 0.79M | 0.486 | 0.498 | 0.504 | 0.775 | 0.271 | 0.246 | **0.463** | |
|
|
| MARS OPT1 (int8) | 0.79M | 0.388 | 0.468 | 0.534 | 0.769 | 0.271 | 0.763 | **0.532** | |
|
|
| QMARS (nf4) | 1.3M | 0.567 | 0.632 | 0.505 | 0.749 | 0.271 | 0.728 | **0.575** | |
|
|
|
|
|
|
|
|
## Rank r=8 |
|
|
|
|
|
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average | |
|
|
|---|---|---|---|---|---|---|---|---| |
|
|
| LoHA | 12.6M | 0.255 | 0.260 | 0.516 | 0.681 | 0.274 | 0.288 | **0.379** | |
|
|
| LoRA | 6.3M | 0.385 | 0.335 | 0.530 | 0.800 | 0.365 | 0.851 | **0.544** | |
|
|
| LoRA (fp4) | 6.3M | 0.500 | 0.414 | 0.511 | 0.810 | 0.362 | 0.833 | **0.572** | |
|
|
| LoRA (int8) | 6.3M | 0.578 | 0.418 | 0.496 | 0.622 | 0.271 | 0.251 | **0.439** | |
|
|
| QLoRA (nf4) | 6.3M | 0.404 | 0.291 | 0.540 | 0.799 | 0.351 | 0.845 | **0.538** | |
|
|
| VB LoRA | 6.4M | 0.239 | 0.265 | 0.523 | 0.668 | 0.275 | 0.259 | **0.371** | |
|
|
| MARS OPT0 | 5.2M | 0.618 | 0.502 | 0.529 | 0.802 | 0.446 | 0.830 | **0.621** | |
|
|
| MARS OPT0 (fp4) | 5.2M | 0.611 | 0.567 | 0.524 | 0.805 | 0.409 | 0.813 | **0.622** | |
|
|
| MARS OPT0 (int8) | 5.2M | 0.684 | 0.493 | 0.540 | 0.798 | 0.429 | 0.819 | **0.627** | |
|
|
| MARS OPT1 | 3.2M | 0.585 | 0.462 | 0.494 | 0.788 | 0.410 | 0.820 | **0.593** | |
|
|
| MARS OPT1 (fp4) | 3.2M | 0.680 | 0.438 | 0.504 | 0.798 | 0.391 | 0.806 | **0.603** | |
|
|
| MARS OPT1 (int8) | 3.2M | 0.579 | 0.514 | 0.512 | 0.802 | 0.417 | 0.813 | **0.606** | |
|
|
| QMARS (nf4) | 5.2M | 0.739 | 0.620 | 0.548 | 0.793 | 0.344 | 0.822 | **0.644** | |
|
|
|
|
|
|
|
|
## Rank r=16 |
|
|
|
|
|
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average | |
|
|
|---|---|---|---|---|---|---|---|---| |
|
|
| LoRA-XS | 0.04M | 0.274 | 0.261 | 0.530 | 0.694 | 0.281 | 0.338 | **0.396** | |
|
|
|
|
|
|
|
|
## Rank r=32 |
|
|
|
|
|
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average | |
|
|
|---|---|---|---|---|---|---|---|---| |
|
|
| LoHA | 50.5M | 0.268 | 0.277 | 0.524 | 0.684 | 0.281 | 0.556 | **0.432** | |
|
|
| LoRA | 25.2M | 0.505 | 0.375 | 0.526 | 0.818 | 0.440 | 0.860 | **0.588** | |
|
|
| LoRA (fp4) | 25.2M | 0.508 | 0.574 | 0.548 | 0.816 | 0.433 | 0.840 | **0.620** | |
|
|
| LoRA (int8) | 25.2M | 0.220 | 0.263 | 0.494 | 0.622 | 0.244 | 0.250 | **0.349** | |
|
|
| QLoRA (nf4) | 25.2M | 0.641 | 0.480 | 0.504 | 0.809 | 0.434 | 0.852 | **0.620** | |
|
|
| VB LoRA | 25.3M | 0.274 | 0.274 | 0.519 | 0.679 | 0.277 | 0.268 | **0.382** | |
|
|
| MARS OPT0 | 21.0M | 0.614 | 0.464 | 0.490 | 0.804 | 0.430 | 0.856 | **0.610** | |
|
|
| MARS OPT0 (fp4) | 21.0M | 0.712 | 0.550 | 0.516 | 0.817 | 0.454 | 0.852 | **0.650** | |
|
|
| MARS OPT0 (int8) | 21.0M | 0.645 | 0.479 | 0.533 | 0.808 | 0.446 | 0.858 | **0.628** | |
|
|
| MARS OPT1 | 12.6M | 0.561 | 0.393 | 0.532 | 0.815 | 0.441 | 0.834 | **0.596** | |
|
|
| MARS OPT1 (fp4) | 12.6M | 0.675 | 0.462 | 0.499 | 0.806 | 0.401 | 0.826 | **0.611** | |
|
|
| MARS OPT1 (int8) | 12.6M | 0.384 | 0.368 | 0.543 | 0.814 | 0.463 | 0.828 | **0.567** | |
|
|
| QMARS (nf4) | 21.0M | 0.703 | 0.611 | 0.538 | 0.819 | 0.408 | 0.843 | **0.654** | |
|
|
|
|
|
|
|
|
## Rank r=64 |
|
|
|
|
|
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average | |
|
|
|---|---|---|---|---|---|---|---|---| |
|
|
| LoRA-XS | 0.63M | 0.334 | 0.351 | 0.515 | 0.784 | 0.310 | 0.817 | **0.518** | |
|
|
|
|
|
|
|
|
## Rank r=256 |
|
|
|
|
|
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average | |
|
|
|---|---|---|---|---|---|---|---|---| |
|
|
| LoRA-XS | 10.1M | 0.359 | 0.294 | 0.504 | 0.622 | 0.260 | 0.249 | **0.381** | |