Update README.md
Browse files
README.md
CHANGED
|
@@ -108,7 +108,30 @@ Poro 2 70B Instruct shows substantial improvements in Finnish instruction-follow
|
|
| 108 |
| AlpacaEval 2 | **49.77** | 43.87 | 45.12 |
|
| 109 |
|
| 110 |
|
| 111 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
- **Finnish**: 66% win rate vs Llama 3.3 70B Instruct
|
| 113 |
- **English**: 57% win rate vs Llama 3.3 70B Instruct
|
| 114 |
|
|
|
|
| 108 |
| AlpacaEval 2 | **49.77** | 43.87 | 45.12 |
|
| 109 |
|
| 110 |
|
| 111 |
+
### MTBench scores per-category
|
| 112 |
+
|
| 113 |
+
| | Finnish | English |
|
| 114 |
+
|-----------------|---------|-----------|
|
| 115 |
+
| Coding | 6.75 | 7.25 |
|
| 116 |
+
| Extraction | 9.00 | 8.85 |
|
| 117 |
+
| Humanities | 8.75 | 8.95 |
|
| 118 |
+
| Math | 6.70 | 8.20 |
|
| 119 |
+
| Reasoning | 6.10 | 8.15 |
|
| 120 |
+
| Roleplay | 8.20 | 8.60 |
|
| 121 |
+
| STEM | 8.60 | 8.50 |
|
| 122 |
+
| Writing | 8.10 | 8.85 |
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
### MTBench scores per-turn
|
| 126 |
+
|
| 127 |
+
| | Finnish | English |
|
| 128 |
+
|-----------------|---------|-----------|
|
| 129 |
+
| first turn | 8.0 | 8.66 |
|
| 130 |
+
| second turn | 7.55 | 8.17 |
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
### Pairwise Comparisons on MTBench
|
| 134 |
+
|
| 135 |
- **Finnish**: 66% win rate vs Llama 3.3 70B Instruct
|
| 136 |
- **English**: 57% win rate vs Llama 3.3 70B Instruct
|
| 137 |
|