Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- allenai/ai2_arc
|
| 4 |
+
- Rowan/hellaswag
|
| 5 |
+
- EleutherAI/logiqa
|
| 6 |
+
- google/boolq
|
| 7 |
+
- allenai/winogrande
|
| 8 |
+
base_model:
|
| 9 |
+
- TinyLlama/TinyLlama_v1.1
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# TinyLlama_v1.1 MARS PEFT Benchmark
|
| 13 |
+
|
| 14 |
+
This repository contains adapter checkpoints from a comprehensive evaluation comparing MARS (our method) against various PEFT (Parameter-Efficient Fine-Tuning) methods across different ranks.
|
| 15 |
+
|
| 16 |
+
## Overview
|
| 17 |
+
|
| 18 |
+
We evaluated multiple PEFT methods including:
|
| 19 |
+
- **LoRA** (Low-Rank Adaptation)
|
| 20 |
+
- **LoRA-XS** (Extra Small LoRA)
|
| 21 |
+
- **LoHA** (Low-Rank Hadamard Adaptation)
|
| 22 |
+
- **VB LoRA** (Vector Bank LoRA)
|
| 23 |
+
- **QLoRA** (Quantized LoRA with NF4)
|
| 24 |
+
- **MARS OPT0 & OPT1** (our method with different optimization levels)
|
| 25 |
+
- **QMARS** (Quantized MARS with NF4)
|
| 26 |
+
|
| 27 |
+
Each method was tested at multiple ranks (r=2, 8, 16, 32, 64, 256 where applicable) on six common language understanding benchmarks:
|
| 28 |
+
- **ARC-E** (AI2 Reasoning Challenge - Easy)
|
| 29 |
+
- **ARC-C** (AI2 Reasoning Challenge - Challenge)
|
| 30 |
+
- **Winogrande** (Commonsense reasoning)
|
| 31 |
+
- **BoolQ** (Boolean question answering)
|
| 32 |
+
- **LogiQA** (Logical reasoning)
|
| 33 |
+
- **HellaSwag** (Commonsense inference)
|
| 34 |
+
|
| 35 |
+
Both non-quantized and quantized (fp4, int8) variants were evaluated to assess performance-efficiency trade-offs across different parameter budgets.
|
| 36 |
+
|
| 37 |
+
## Results Summary
|
| 38 |
+
|
| 39 |
+
The tables below show detailed performance comparisons, overall average and averages across all ranks and benchmark datasets for each method.
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
## Overall Averages
|
| 45 |
+
|
| 46 |
+
| Method | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Overall average |
|
| 47 |
+
|---|---|---|---|---|---|---|---|
|
| 48 |
+
| QMARS (nf4) | 0.669 | 0.621 | 0.530 | 0.787 | 0.341 | 0.798 | **0.624** |
|
| 49 |
+
| MARS OPT1 | 0.523 | 0.447 | 0.513 | 0.795 | 0.374 | 0.807 | **0.576** |
|
| 50 |
+
| MARS OPT0 (fp4) | 0.577 | 0.562 | 0.515 | 0.767 | 0.378 | 0.639 | **0.573** |
|
| 51 |
+
| MARS OPT1 (int8) | 0.450 | 0.450 | 0.530 | 0.795 | 0.384 | 0.802 | **0.568** |
|
| 52 |
+
| MARS OPT0 (int8) | 0.594 | 0.515 | 0.526 | 0.743 | 0.382 | 0.641 | **0.567** |
|
| 53 |
+
| MARS OPT0 | 0.599 | 0.511 | 0.508 | 0.742 | 0.382 | 0.645 | **0.565** |
|
| 54 |
+
| LoRA (fp4) | 0.447 | 0.435 | 0.524 | 0.795 | 0.367 | 0.818 | **0.564** |
|
| 55 |
+
| MARS OPT1 (fp4) | 0.613 | 0.466 | 0.502 | 0.793 | 0.354 | 0.626 | **0.559** |
|
| 56 |
+
| QLoRA (nf4) | 0.451 | 0.349 | 0.520 | 0.787 | 0.361 | 0.828 | **0.549** |
|
| 57 |
+
| LoRA | 0.392 | 0.344 | 0.522 | 0.793 | 0.367 | 0.835 | **0.542** |
|
| 58 |
+
| LoRA-XS | 0.322 | 0.302 | 0.516 | 0.700 | 0.283 | 0.468 | **0.432** |
|
| 59 |
+
| LoRA (int8) | 0.468 | 0.391 | 0.498 | 0.667 | 0.262 | 0.249 | **0.422** |
|
| 60 |
+
| LoHA | 0.257 | 0.261 | 0.514 | 0.674 | 0.282 | 0.369 | **0.393** |
|
| 61 |
+
| VB LoRA | 0.248 | 0.266 | 0.522 | 0.666 | 0.277 | 0.261 | **0.373** |
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
## Rank r=2
|
| 66 |
+
|
| 67 |
+
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|
| 68 |
+
|---|---|---|---|---|---|---|---|---|
|
| 69 |
+
| LoHA | 3.2M | 0.250 | 0.244 | 0.503 | 0.657 | 0.291 | 0.263 | **0.368** |
|
| 70 |
+
| LoRA | 1.6M | 0.286 | 0.320 | 0.510 | 0.760 | 0.295 | 0.794 | **0.494** |
|
| 71 |
+
| LoRA (fp4) | 1.6M | 0.334 | 0.317 | 0.512 | 0.760 | 0.307 | 0.780 | **0.502** |
|
| 72 |
+
| LoRA (int8) | 1.6M | 0.606 | 0.493 | 0.504 | 0.757 | 0.271 | 0.245 | **0.479** |
|
| 73 |
+
| QLoRA (nf4) | 1.6M | 0.308 | 0.277 | 0.516 | 0.754 | 0.297 | 0.788 | **0.490** |
|
| 74 |
+
| VB LoRA | 1.6M | 0.233 | 0.260 | 0.525 | 0.650 | 0.279 | 0.257 | **0.367** |
|
| 75 |
+
| MARS OPT0 | 1.3M | 0.566 | 0.567 | 0.504 | 0.622 | 0.271 | 0.249 | **0.463** |
|
| 76 |
+
| MARS OPT0 (fp4) | 1.3M | 0.407 | 0.569 | 0.504 | 0.679 | 0.271 | 0.251 | **0.447** |
|
| 77 |
+
| MARS OPT0 (int8) | 1.3M | 0.454 | 0.574 | 0.504 | 0.621 | 0.271 | 0.247 | **0.445** |
|
| 78 |
+
| MARS OPT1 | 0.79M | 0.424 | 0.485 | 0.514 | 0.780 | 0.270 | 0.766 | **0.540** |
|
| 79 |
+
| MARS OPT1 (fp4) | 0.79M | 0.486 | 0.498 | 0.504 | 0.775 | 0.271 | 0.246 | **0.463** |
|
| 80 |
+
| MARS OPT1 (int8) | 0.79M | 0.388 | 0.468 | 0.534 | 0.769 | 0.271 | 0.763 | **0.532** |
|
| 81 |
+
| QMARS (nf4) | 1.3M | 0.567 | 0.632 | 0.505 | 0.749 | 0.271 | 0.728 | **0.575** |
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
## Rank r=8
|
| 85 |
+
|
| 86 |
+
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|
| 87 |
+
|---|---|---|---|---|---|---|---|---|
|
| 88 |
+
| LoHA | 12.6M | 0.255 | 0.260 | 0.516 | 0.681 | 0.274 | 0.288 | **0.379** |
|
| 89 |
+
| LoRA | 6.3M | 0.385 | 0.335 | 0.530 | 0.800 | 0.365 | 0.851 | **0.544** |
|
| 90 |
+
| LoRA (fp4) | 6.3M | 0.500 | 0.414 | 0.511 | 0.810 | 0.362 | 0.833 | **0.572** |
|
| 91 |
+
| LoRA (int8) | 6.3M | 0.578 | 0.418 | 0.496 | 0.622 | 0.271 | 0.251 | **0.439** |
|
| 92 |
+
| QLoRA (nf4) | 6.3M | 0.404 | 0.291 | 0.540 | 0.799 | 0.351 | 0.845 | **0.538** |
|
| 93 |
+
| VB LoRA | 6.4M | 0.239 | 0.265 | 0.523 | 0.668 | 0.275 | 0.259 | **0.371** |
|
| 94 |
+
| MARS OPT0 | 5.2M | 0.618 | 0.502 | 0.529 | 0.802 | 0.446 | 0.830 | **0.621** |
|
| 95 |
+
| MARS OPT0 (fp4) | 5.2M | 0.611 | 0.567 | 0.524 | 0.805 | 0.409 | 0.813 | **0.622** |
|
| 96 |
+
| MARS OPT0 (int8) | 5.2M | 0.684 | 0.493 | 0.540 | 0.798 | 0.429 | 0.819 | **0.627** |
|
| 97 |
+
| MARS OPT1 | 3.2M | 0.585 | 0.462 | 0.494 | 0.788 | 0.410 | 0.820 | **0.593** |
|
| 98 |
+
| MARS OPT1 (fp4) | 3.2M | 0.680 | 0.438 | 0.504 | 0.798 | 0.391 | 0.806 | **0.603** |
|
| 99 |
+
| MARS OPT1 (int8) | 3.2M | 0.579 | 0.514 | 0.512 | 0.802 | 0.417 | 0.813 | **0.606** |
|
| 100 |
+
| QMARS (nf4) | 5.2M | 0.739 | 0.620 | 0.548 | 0.793 | 0.344 | 0.822 | **0.644** |
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
## Rank r=16
|
| 104 |
+
|
| 105 |
+
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|
| 106 |
+
|---|---|---|---|---|---|---|---|---|
|
| 107 |
+
| LoRA-XS | 0.04M | 0.274 | 0.261 | 0.530 | 0.694 | 0.281 | 0.338 | **0.396** |
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
## Rank r=32
|
| 111 |
+
|
| 112 |
+
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|
| 113 |
+
|---|---|---|---|---|---|---|---|---|
|
| 114 |
+
| LoHA | 50.5M | 0.268 | 0.277 | 0.524 | 0.684 | 0.281 | 0.556 | **0.432** |
|
| 115 |
+
| LoRA | 25.2M | 0.505 | 0.375 | 0.526 | 0.818 | 0.440 | 0.860 | **0.588** |
|
| 116 |
+
| LoRA (fp4) | 25.2M | 0.508 | 0.574 | 0.548 | 0.816 | 0.433 | 0.840 | **0.620** |
|
| 117 |
+
| LoRA (int8) | 25.2M | 0.220 | 0.263 | 0.494 | 0.622 | 0.244 | 0.250 | **0.349** |
|
| 118 |
+
| QLoRA (nf4) | 25.2M | 0.641 | 0.480 | 0.504 | 0.809 | 0.434 | 0.852 | **0.620** |
|
| 119 |
+
| VB LoRA | 25.3M | 0.274 | 0.274 | 0.519 | 0.679 | 0.277 | 0.268 | **0.382** |
|
| 120 |
+
| MARS OPT0 | 21.0M | 0.614 | 0.464 | 0.490 | 0.804 | 0.430 | 0.856 | **0.610** |
|
| 121 |
+
| MARS OPT0 (fp4) | 21.0M | 0.712 | 0.550 | 0.516 | 0.817 | 0.454 | 0.852 | **0.650** |
|
| 122 |
+
| MARS OPT0 (int8) | 21.0M | 0.645 | 0.479 | 0.533 | 0.808 | 0.446 | 0.858 | **0.628** |
|
| 123 |
+
| MARS OPT1 | 12.6M | 0.561 | 0.393 | 0.532 | 0.815 | 0.441 | 0.834 | **0.596** |
|
| 124 |
+
| MARS OPT1 (fp4) | 12.6M | 0.675 | 0.462 | 0.499 | 0.806 | 0.401 | 0.826 | **0.611** |
|
| 125 |
+
| MARS OPT1 (int8) | 12.6M | 0.384 | 0.368 | 0.543 | 0.814 | 0.463 | 0.828 | **0.567** |
|
| 126 |
+
| QMARS (nf4) | 21.0M | 0.703 | 0.611 | 0.538 | 0.819 | 0.408 | 0.843 | **0.654** |
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
## Rank r=64
|
| 130 |
+
|
| 131 |
+
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|
| 132 |
+
|---|---|---|---|---|---|---|---|---|
|
| 133 |
+
| LoRA-XS | 0.63M | 0.334 | 0.351 | 0.515 | 0.784 | 0.310 | 0.817 | **0.518** |
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
## Rank r=256
|
| 137 |
+
|
| 138 |
+
| Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
|
| 139 |
+
|---|---|---|---|---|---|---|---|---|
|
| 140 |
+
| LoRA-XS | 10.1M | 0.359 | 0.294 | 0.504 | 0.622 | 0.260 | 0.249 | **0.381** |
|