martinkorelic commited on
Commit
f62c811
·
verified ·
1 Parent(s): 4194fd6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -0
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - allenai/ai2_arc
4
+ - Rowan/hellaswag
5
+ - EleutherAI/logiqa
6
+ - google/boolq
7
+ - allenai/winogrande
8
+ base_model:
9
+ - TinyLlama/TinyLlama_v1.1
10
+ ---
11
+
12
+ # TinyLlama_v1.1 MARS PEFT Benchmark
13
+
14
+ This repository contains adapter checkpoints from a comprehensive evaluation comparing MARS (our method) against various PEFT (Parameter-Efficient Fine-Tuning) methods across different ranks.
15
+
16
+ ## Overview
17
+
18
+ We evaluated multiple PEFT methods including:
19
+ - **LoRA** (Low-Rank Adaptation)
20
+ - **LoRA-XS** (Extra Small LoRA)
21
+ - **LoHA** (Low-Rank Hadamard Adaptation)
22
+ - **VB LoRA** (Vector Bank LoRA)
23
+ - **QLoRA** (Quantized LoRA with NF4)
24
+ - **MARS OPT0 & OPT1** (our method with different optimization levels)
25
+ - **QMARS** (Quantized MARS with NF4)
26
+
27
+ Each method was tested at multiple ranks (r=2, 8, 16, 32, 64, 256 where applicable) on six common language understanding benchmarks:
28
+ - **ARC-E** (AI2 Reasoning Challenge - Easy)
29
+ - **ARC-C** (AI2 Reasoning Challenge - Challenge)
30
+ - **Winogrande** (Commonsense reasoning)
31
+ - **BoolQ** (Boolean question answering)
32
+ - **LogiQA** (Logical reasoning)
33
+ - **HellaSwag** (Commonsense inference)
34
+
35
+ Both non-quantized and quantized (fp4, int8) variants were evaluated to assess performance-efficiency trade-offs across different parameter budgets.
36
+
37
+ ## Results Summary
38
+
39
+ The tables below show detailed performance comparisons, overall average and averages across all ranks and benchmark datasets for each method.
40
+
41
+ ---
42
+
43
+
44
+ ## Overall Averages
45
+
46
+ | Method | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Overall average |
47
+ |---|---|---|---|---|---|---|---|
48
+ | QMARS (nf4) | 0.669 | 0.621 | 0.530 | 0.787 | 0.341 | 0.798 | **0.624** |
49
+ | MARS OPT1 | 0.523 | 0.447 | 0.513 | 0.795 | 0.374 | 0.807 | **0.576** |
50
+ | MARS OPT0 (fp4) | 0.577 | 0.562 | 0.515 | 0.767 | 0.378 | 0.639 | **0.573** |
51
+ | MARS OPT1 (int8) | 0.450 | 0.450 | 0.530 | 0.795 | 0.384 | 0.802 | **0.568** |
52
+ | MARS OPT0 (int8) | 0.594 | 0.515 | 0.526 | 0.743 | 0.382 | 0.641 | **0.567** |
53
+ | MARS OPT0 | 0.599 | 0.511 | 0.508 | 0.742 | 0.382 | 0.645 | **0.565** |
54
+ | LoRA (fp4) | 0.447 | 0.435 | 0.524 | 0.795 | 0.367 | 0.818 | **0.564** |
55
+ | MARS OPT1 (fp4) | 0.613 | 0.466 | 0.502 | 0.793 | 0.354 | 0.626 | **0.559** |
56
+ | QLoRA (nf4) | 0.451 | 0.349 | 0.520 | 0.787 | 0.361 | 0.828 | **0.549** |
57
+ | LoRA | 0.392 | 0.344 | 0.522 | 0.793 | 0.367 | 0.835 | **0.542** |
58
+ | LoRA-XS | 0.322 | 0.302 | 0.516 | 0.700 | 0.283 | 0.468 | **0.432** |
59
+ | LoRA (int8) | 0.468 | 0.391 | 0.498 | 0.667 | 0.262 | 0.249 | **0.422** |
60
+ | LoHA | 0.257 | 0.261 | 0.514 | 0.674 | 0.282 | 0.369 | **0.393** |
61
+ | VB LoRA | 0.248 | 0.266 | 0.522 | 0.666 | 0.277 | 0.261 | **0.373** |
62
+
63
+ ---
64
+
65
+ ## Rank r=2
66
+
67
+ | Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
68
+ |---|---|---|---|---|---|---|---|---|
69
+ | LoHA | 3.2M | 0.250 | 0.244 | 0.503 | 0.657 | 0.291 | 0.263 | **0.368** |
70
+ | LoRA | 1.6M | 0.286 | 0.320 | 0.510 | 0.760 | 0.295 | 0.794 | **0.494** |
71
+ | LoRA (fp4) | 1.6M | 0.334 | 0.317 | 0.512 | 0.760 | 0.307 | 0.780 | **0.502** |
72
+ | LoRA (int8) | 1.6M | 0.606 | 0.493 | 0.504 | 0.757 | 0.271 | 0.245 | **0.479** |
73
+ | QLoRA (nf4) | 1.6M | 0.308 | 0.277 | 0.516 | 0.754 | 0.297 | 0.788 | **0.490** |
74
+ | VB LoRA | 1.6M | 0.233 | 0.260 | 0.525 | 0.650 | 0.279 | 0.257 | **0.367** |
75
+ | MARS OPT0 | 1.3M | 0.566 | 0.567 | 0.504 | 0.622 | 0.271 | 0.249 | **0.463** |
76
+ | MARS OPT0 (fp4) | 1.3M | 0.407 | 0.569 | 0.504 | 0.679 | 0.271 | 0.251 | **0.447** |
77
+ | MARS OPT0 (int8) | 1.3M | 0.454 | 0.574 | 0.504 | 0.621 | 0.271 | 0.247 | **0.445** |
78
+ | MARS OPT1 | 0.79M | 0.424 | 0.485 | 0.514 | 0.780 | 0.270 | 0.766 | **0.540** |
79
+ | MARS OPT1 (fp4) | 0.79M | 0.486 | 0.498 | 0.504 | 0.775 | 0.271 | 0.246 | **0.463** |
80
+ | MARS OPT1 (int8) | 0.79M | 0.388 | 0.468 | 0.534 | 0.769 | 0.271 | 0.763 | **0.532** |
81
+ | QMARS (nf4) | 1.3M | 0.567 | 0.632 | 0.505 | 0.749 | 0.271 | 0.728 | **0.575** |
82
+
83
+
84
+ ## Rank r=8
85
+
86
+ | Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
87
+ |---|---|---|---|---|---|---|---|---|
88
+ | LoHA | 12.6M | 0.255 | 0.260 | 0.516 | 0.681 | 0.274 | 0.288 | **0.379** |
89
+ | LoRA | 6.3M | 0.385 | 0.335 | 0.530 | 0.800 | 0.365 | 0.851 | **0.544** |
90
+ | LoRA (fp4) | 6.3M | 0.500 | 0.414 | 0.511 | 0.810 | 0.362 | 0.833 | **0.572** |
91
+ | LoRA (int8) | 6.3M | 0.578 | 0.418 | 0.496 | 0.622 | 0.271 | 0.251 | **0.439** |
92
+ | QLoRA (nf4) | 6.3M | 0.404 | 0.291 | 0.540 | 0.799 | 0.351 | 0.845 | **0.538** |
93
+ | VB LoRA | 6.4M | 0.239 | 0.265 | 0.523 | 0.668 | 0.275 | 0.259 | **0.371** |
94
+ | MARS OPT0 | 5.2M | 0.618 | 0.502 | 0.529 | 0.802 | 0.446 | 0.830 | **0.621** |
95
+ | MARS OPT0 (fp4) | 5.2M | 0.611 | 0.567 | 0.524 | 0.805 | 0.409 | 0.813 | **0.622** |
96
+ | MARS OPT0 (int8) | 5.2M | 0.684 | 0.493 | 0.540 | 0.798 | 0.429 | 0.819 | **0.627** |
97
+ | MARS OPT1 | 3.2M | 0.585 | 0.462 | 0.494 | 0.788 | 0.410 | 0.820 | **0.593** |
98
+ | MARS OPT1 (fp4) | 3.2M | 0.680 | 0.438 | 0.504 | 0.798 | 0.391 | 0.806 | **0.603** |
99
+ | MARS OPT1 (int8) | 3.2M | 0.579 | 0.514 | 0.512 | 0.802 | 0.417 | 0.813 | **0.606** |
100
+ | QMARS (nf4) | 5.2M | 0.739 | 0.620 | 0.548 | 0.793 | 0.344 | 0.822 | **0.644** |
101
+
102
+
103
+ ## Rank r=16
104
+
105
+ | Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
106
+ |---|---|---|---|---|---|---|---|---|
107
+ | LoRA-XS | 0.04M | 0.274 | 0.261 | 0.530 | 0.694 | 0.281 | 0.338 | **0.396** |
108
+
109
+
110
+ ## Rank r=32
111
+
112
+ | Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
113
+ |---|---|---|---|---|---|---|---|---|
114
+ | LoHA | 50.5M | 0.268 | 0.277 | 0.524 | 0.684 | 0.281 | 0.556 | **0.432** |
115
+ | LoRA | 25.2M | 0.505 | 0.375 | 0.526 | 0.818 | 0.440 | 0.860 | **0.588** |
116
+ | LoRA (fp4) | 25.2M | 0.508 | 0.574 | 0.548 | 0.816 | 0.433 | 0.840 | **0.620** |
117
+ | LoRA (int8) | 25.2M | 0.220 | 0.263 | 0.494 | 0.622 | 0.244 | 0.250 | **0.349** |
118
+ | QLoRA (nf4) | 25.2M | 0.641 | 0.480 | 0.504 | 0.809 | 0.434 | 0.852 | **0.620** |
119
+ | VB LoRA | 25.3M | 0.274 | 0.274 | 0.519 | 0.679 | 0.277 | 0.268 | **0.382** |
120
+ | MARS OPT0 | 21.0M | 0.614 | 0.464 | 0.490 | 0.804 | 0.430 | 0.856 | **0.610** |
121
+ | MARS OPT0 (fp4) | 21.0M | 0.712 | 0.550 | 0.516 | 0.817 | 0.454 | 0.852 | **0.650** |
122
+ | MARS OPT0 (int8) | 21.0M | 0.645 | 0.479 | 0.533 | 0.808 | 0.446 | 0.858 | **0.628** |
123
+ | MARS OPT1 | 12.6M | 0.561 | 0.393 | 0.532 | 0.815 | 0.441 | 0.834 | **0.596** |
124
+ | MARS OPT1 (fp4) | 12.6M | 0.675 | 0.462 | 0.499 | 0.806 | 0.401 | 0.826 | **0.611** |
125
+ | MARS OPT1 (int8) | 12.6M | 0.384 | 0.368 | 0.543 | 0.814 | 0.463 | 0.828 | **0.567** |
126
+ | QMARS (nf4) | 21.0M | 0.703 | 0.611 | 0.538 | 0.819 | 0.408 | 0.843 | **0.654** |
127
+
128
+
129
+ ## Rank r=64
130
+
131
+ | Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
132
+ |---|---|---|---|---|---|---|---|---|
133
+ | LoRA-XS | 0.63M | 0.334 | 0.351 | 0.515 | 0.784 | 0.310 | 0.817 | **0.518** |
134
+
135
+
136
+ ## Rank r=256
137
+
138
+ | Method | Trainable Params | ARC-E | ARC-C | Winogrande | BoolQ | LogiQA | HellaSwag | Average |
139
+ |---|---|---|---|---|---|---|---|---|
140
+ | LoRA-XS | 10.1M | 0.359 | 0.294 | 0.504 | 0.622 | 0.260 | 0.249 | **0.381** |