laineyyy commited on
Commit
488aaa2
·
verified ·
1 Parent(s): ba7a467

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -1
README.md CHANGED
@@ -108,7 +108,30 @@ Poro 2 70B Instruct shows substantial improvements in Finnish instruction-follow
108
  | AlpacaEval 2 | **49.77** | 43.87 | 45.12 |
109
 
110
 
111
- ### Pairwise Comparisons (MTBench)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  - **Finnish**: 66% win rate vs Llama 3.3 70B Instruct
113
  - **English**: 57% win rate vs Llama 3.3 70B Instruct
114
 
 
108
  | AlpacaEval 2 | **49.77** | 43.87 | 45.12 |
109
 
110
 
111
+ ### MTBench scores per-category
112
+
113
+ | | Finnish | English |
114
+ |-----------------|---------|-----------|
115
+ | Coding | 6.75 | 7.25 |
116
+ | Extraction | 9.00 | 8.85 |
117
+ | Humanities | 8.75 | 8.95 |
118
+ | Math | 6.70 | 8.20 |
119
+ | Reasoning | 6.10 | 8.15 |
120
+ | Roleplay | 8.20 | 8.60 |
121
+ | STEM | 8.60 | 8.50 |
122
+ | Writing | 8.10 | 8.85 |
123
+
124
+
125
+ ### MTBench scores per-turn
126
+
127
+ | | Finnish | English |
128
+ |-----------------|---------|-----------|
129
+ | first turn | 8.0 | 8.66 |
130
+ | second turn | 7.55 | 8.17 |
131
+
132
+
133
+ ### Pairwise Comparisons on MTBench
134
+
135
  - **Finnish**: 66% win rate vs Llama 3.3 70B Instruct
136
  - **English**: 57% win rate vs Llama 3.3 70B Instruct
137