d-SmolLM2-360M / README.md
aloobun's picture
Update README.md
2a1d82b verified
|
raw
history blame
7.68 kB
metadata
library_name: transformers
license: apache-2.0
language:
  - en
tags:
  - smollm2
  - smollm2-360m
  - distillation

This is a distillation experiment with SmolLM2-1.7B as teacher and SmolLM2-360M as student model.

It slightly improves upon the performance of the basemodel on the following tasks (wip):

Tasks HuggingFaceTB/SmolLM2-360M Value aloobun/d-SmolLM2-360M Value
- leaderboard_bbh_causal_judgement 0.4545 0.4652
- leaderboard_bbh_geometric_shapes 0.1680 0.2040
- leaderboard_bbh_movie_recommendation 0.2120 0.2440
- leaderboard_bbh_penguins_in_a_table 0.2055 0.2123
- leaderboard_bbh_reasoning_about_colored_objects 0.1160 0.1320
- leaderboard_bbh_ruin_names 0.2360 0.2480
- leaderboard_bbh_salient_translation_error_detection 0.1480 0.2120
- leaderboard_bbh_snarks 0.5169 0.5281
- leaderboard_bbh_temporal_sequences 0.2720 0.2800
- leaderboard_musr_murder_mysteries 0.5040 0.5160

Well, it didn’t work as well as I hoped, will try again.

Eval Results aloobun/d-SmolLM2-360M (WIP)

GPQA

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.2071 ± 0.0289
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.2308 ± 0.0180
- leaderboard_gpqa_main 1 none 0 acc_norm 0.2679 ± 0.0209

MUSR

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5160 ± 0.0317
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.2383 ± 0.0267
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.4400 ± 0.0315

BBH

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.5480 ± 0.0315
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.4652 ± 0.0366
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.1560 ± 0.0230
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.3120 ± 0.0294
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.5240 ± 0.0316
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.2040 ± 0.0255
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.5000 ± 0.0317
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.2240 ± 0.0264
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.1440 ± 0.0222
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.3320 ± 0.0298
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.2440 ± 0.0272
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.5800 ± 0.0313
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.2080 ± 0.0257
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.2123 ± 0.0340
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.1320 ± 0.0215
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.2480 ± 0.0274
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.2120 ± 0.0259
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.5281 ± 0.0375
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.4600 ± 0.0316
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.2800 ± 0.0285
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.1720 ± 0.0239
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1440 ± 0.0222
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3000 ± 0.0290
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.5480 ± 0.0315

MMLU_PRO

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_mmlu_pro 0.1 none 5 acc 0.1173 ± 0.0029

IFEVAL

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.2866 ± N/A
none 0 inst_level_strict_acc 0.2770 ± N/A
none 0 prompt_level_loose_acc 0.1497 ± 0.0154
none 0 prompt_level_strict_acc 0.1423 ± 0.0150

MATH HARD

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 2 none 4 exact_match 0.0033 ± 0.0033
- leaderboard_math_counting_and_prob_hard 2 none 4 exact_match 0.0081 ± 0.0081
- leaderboard_math_geometry_hard 2 none 4 exact_match 0.0000 ± 0.0000
- leaderboard_math_intermediate_algebra_hard 2 none 4 exact_match 0.0000 ± 0.0000
- leaderboard_math_num_theory_hard 2 none 4 exact_match 0.0065 ± 0.0065
- leaderboard_math_prealgebra_hard 2 none 4 exact_match 0.0104 ± 0.0073
- leaderboard_math_precalculus_hard 2 none 4 exact_match 0.0000 ± 0.0000