Update README.md
Browse files
README.md
CHANGED
|
@@ -65,7 +65,7 @@ On general reasoning tasks including **code**, **math**, **science**, **alignmen
|
|
| 65 |
| Multi-Challenge | 41.14 | 36.30 | 36.97 | 38.72 | <u>49.40<u> | 41.20 | **52.21** |
|
| 66 |
| **Tool Use** | | | | | | | |
|
| 67 |
| BFCL-V4 | 44.87 | 42.20 | 45.14 | 47.90 | 48.6 | <u>53.8<u> | **56.50** |
|
| 68 |
-
| Tau2-Bench |
|
| 69 |
|
| 70 |
|
| 71 |
|
|
|
|
| 65 |
| Multi-Challenge | 41.14 | 36.30 | 36.97 | 38.72 | <u>49.40<u> | 41.20 | **52.21** |
|
| 66 |
| **Tool Use** | | | | | | | |
|
| 67 |
| BFCL-V4 | 44.87 | 42.20 | 45.14 | 47.90 | 48.6 | <u>53.8<u> | **56.50** |
|
| 68 |
+
| Tau2-Bench | 45.9 | 42.06 | 44.96 | 45.26 | <u> 47.70<u> | 41.77 | **48.57** |
|
| 69 |
|
| 70 |
|
| 71 |
|