metadata
library_name: transformers
license: apache-2.0
language:
- en
tags:
- smollm2
- smollm2-360m
- distillation
This is a distillation experiment with SmolLM2-1.7B as teacher and SmolLM2-360M as student model.
It slightly improves upon the performance of the basemodel on the following tasks (wip):
| Tasks | HuggingFaceTB/SmolLM2-360M Value | aloobun/d-SmolLM2-360M Value |
|---|---|---|
| - leaderboard_bbh_causal_judgement | 0.4545 | 0.4652 |
| - leaderboard_bbh_geometric_shapes | 0.1680 | 0.2040 |
| - leaderboard_bbh_movie_recommendation | 0.2120 | 0.2440 |
| - leaderboard_bbh_penguins_in_a_table | 0.2055 | 0.2123 |
| - leaderboard_bbh_reasoning_about_colored_objects | 0.1160 | 0.1320 |
| - leaderboard_bbh_ruin_names | 0.2360 | 0.2480 |
| - leaderboard_bbh_salient_translation_error_detection | 0.1480 | 0.2120 |
| - leaderboard_bbh_snarks | 0.5169 | 0.5281 |
| - leaderboard_bbh_temporal_sequences | 0.2720 | 0.2800 |
| - leaderboard_musr_murder_mysteries | 0.5040 | 0.5160 |
Well, it didn’t work as well as I hoped, will try again.
Eval Results aloobun/d-SmolLM2-360M (WIP)
GPQA
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| leaderboard_gpqa | N/A | |||||||
| - leaderboard_gpqa_diamond | 1 | none | 0 | acc_norm | ↑ | 0.2071 | ± | 0.0289 |
| - leaderboard_gpqa_extended | 1 | none | 0 | acc_norm | ↑ | 0.2308 | ± | 0.0180 |
| - leaderboard_gpqa_main | 1 | none | 0 | acc_norm | ↑ | 0.2679 | ± | 0.0209 |
MUSR
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| leaderboard_musr | N/A | |||||||
| - leaderboard_musr_murder_mysteries | 1 | none | 0 | acc_norm | ↑ | 0.5160 | ± | 0.0317 |
| - leaderboard_musr_object_placements | 1 | none | 0 | acc_norm | ↑ | 0.2383 | ± | 0.0267 |
| - leaderboard_musr_team_allocation | 1 | none | 0 | acc_norm | ↑ | 0.4400 | ± | 0.0315 |
BBH
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| leaderboard_bbh | N/A | |||||||
| - leaderboard_bbh_boolean_expressions | 1 | none | 3 | acc_norm | ↑ | 0.5480 | ± | 0.0315 |
| - leaderboard_bbh_causal_judgement | 1 | none | 3 | acc_norm | ↑ | 0.4652 | ± | 0.0366 |
| - leaderboard_bbh_date_understanding | 1 | none | 3 | acc_norm | ↑ | 0.1560 | ± | 0.0230 |
| - leaderboard_bbh_disambiguation_qa | 1 | none | 3 | acc_norm | ↑ | 0.3120 | ± | 0.0294 |
| - leaderboard_bbh_formal_fallacies | 1 | none | 3 | acc_norm | ↑ | 0.5240 | ± | 0.0316 |
| - leaderboard_bbh_geometric_shapes | 1 | none | 3 | acc_norm | ↑ | 0.2040 | ± | 0.0255 |
| - leaderboard_bbh_hyperbaton | 1 | none | 3 | acc_norm | ↑ | 0.5000 | ± | 0.0317 |
| - leaderboard_bbh_logical_deduction_five_objects | 1 | none | 3 | acc_norm | ↑ | 0.2240 | ± | 0.0264 |
| - leaderboard_bbh_logical_deduction_seven_objects | 1 | none | 3 | acc_norm | ↑ | 0.1440 | ± | 0.0222 |
| - leaderboard_bbh_logical_deduction_three_objects | 1 | none | 3 | acc_norm | ↑ | 0.3320 | ± | 0.0298 |
| - leaderboard_bbh_movie_recommendation | 1 | none | 3 | acc_norm | ↑ | 0.2440 | ± | 0.0272 |
| - leaderboard_bbh_navigate | 1 | none | 3 | acc_norm | ↑ | 0.5800 | ± | 0.0313 |
| - leaderboard_bbh_object_counting | 1 | none | 3 | acc_norm | ↑ | 0.2080 | ± | 0.0257 |
| - leaderboard_bbh_penguins_in_a_table | 1 | none | 3 | acc_norm | ↑ | 0.2123 | ± | 0.0340 |
| - leaderboard_bbh_reasoning_about_colored_objects | 1 | none | 3 | acc_norm | ↑ | 0.1320 | ± | 0.0215 |
| - leaderboard_bbh_ruin_names | 1 | none | 3 | acc_norm | ↑ | 0.2480 | ± | 0.0274 |
| - leaderboard_bbh_salient_translation_error_detection | 1 | none | 3 | acc_norm | ↑ | 0.2120 | ± | 0.0259 |
| - leaderboard_bbh_snarks | 1 | none | 3 | acc_norm | ↑ | 0.5281 | ± | 0.0375 |
| - leaderboard_bbh_sports_understanding | 1 | none | 3 | acc_norm | ↑ | 0.4600 | ± | 0.0316 |
| - leaderboard_bbh_temporal_sequences | 1 | none | 3 | acc_norm | ↑ | 0.2800 | ± | 0.0285 |
| - leaderboard_bbh_tracking_shuffled_objects_five_objects | 1 | none | 3 | acc_norm | ↑ | 0.1720 | ± | 0.0239 |
| - leaderboard_bbh_tracking_shuffled_objects_seven_objects | 1 | none | 3 | acc_norm | ↑ | 0.1440 | ± | 0.0222 |
| - leaderboard_bbh_tracking_shuffled_objects_three_objects | 1 | none | 3 | acc_norm | ↑ | 0.3000 | ± | 0.0290 |
| - leaderboard_bbh_web_of_lies | 1 | none | 3 | acc_norm | ↑ | 0.5480 | ± | 0.0315 |
MMLU_PRO
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| leaderboard_mmlu_pro | 0.1 | none | 5 | acc | ↑ | 0.1173 | ± | 0.0029 |
IFEVAL
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| leaderboard_ifeval | 3 | none | 0 | inst_level_loose_acc | ↑ | 0.2866 | ± | N/A |
| none | 0 | inst_level_strict_acc | ↑ | 0.2770 | ± | N/A | ||
| none | 0 | prompt_level_loose_acc | ↑ | 0.1497 | ± | 0.0154 | ||
| none | 0 | prompt_level_strict_acc | ↑ | 0.1423 | ± | 0.0150 |
MATH HARD
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| leaderboard_math_hard | N/A | |||||||
| - leaderboard_math_algebra_hard | 2 | none | 4 | exact_match | ↑ | 0.0033 | ± | 0.0033 |
| - leaderboard_math_counting_and_prob_hard | 2 | none | 4 | exact_match | ↑ | 0.0081 | ± | 0.0081 |
| - leaderboard_math_geometry_hard | 2 | none | 4 | exact_match | ↑ | 0.0000 | ± | 0.0000 |
| - leaderboard_math_intermediate_algebra_hard | 2 | none | 4 | exact_match | ↑ | 0.0000 | ± | 0.0000 |
| - leaderboard_math_num_theory_hard | 2 | none | 4 | exact_match | ↑ | 0.0065 | ± | 0.0065 |
| - leaderboard_math_prealgebra_hard | 2 | none | 4 | exact_match | ↑ | 0.0104 | ± | 0.0073 |
| - leaderboard_math_precalculus_hard | 2 | none | 4 | exact_match | ↑ | 0.0000 | ± | 0.0000 |