aloobun commited on
Commit
f2de106
·
verified ·
1 Parent(s): ad4e772

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -5,6 +5,31 @@ license: apache-2.0
5
 
6
  WIP
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ## GPQA
9
 
10
 
 
5
 
6
  WIP
7
 
8
+
9
+ | Tasks |**HuggingFaceTB/SmolLM2-360M** Value|**aloobun/d-SmolLM2-360M** Value|
10
+ |----------------------------------------------------------|-------------:|-------------:|
11
+ | - leaderboard_bbh_causal_judgement | 0.4545 | 0.4652 |
12
+ | - leaderboard_bbh_geometric_shapes | 0.1680 | 0.2040 |
13
+ | - leaderboard_bbh_movie_recommendation | 0.2120 | 0.2440 |
14
+ | - leaderboard_bbh_penguins_in_a_table | 0.2055 | 0.2123 |
15
+ | - leaderboard_bbh_reasoning_about_colored_objects | 0.1160 | 0.1320 |
16
+ | - leaderboard_bbh_ruin_names | 0.2360 | 0.2480 |
17
+ | - leaderboard_bbh_salient_translation_error_detection | 0.1480 | 0.2120 |
18
+ | - leaderboard_bbh_snarks | 0.5169 | 0.5281 |
19
+ | - leaderboard_bbh_temporal_sequences | 0.2720 | 0.2800 |
20
+ | - leaderboard_musr_murder_mysteries | 0.5040 | 0.5160 |
21
+
22
+
23
+
24
+ # Eval Results aloobun/d-SmolLM2-360M
25
+
26
+ Todo:
27
+
28
+ ifeval (0-shot, generative)
29
+
30
+ Math-lvl-5 (4-shots, generative, minerva version)
31
+
32
+
33
  ## GPQA
34
 
35