MTSAIR
/

ROCKET-Qwen-8b

Model card Files Files and versions

ammarali32 commited on 4 days ago

Commit

a7aba1e

·

verified ·

1 Parent(s): e5b7d17

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ The approach operates in whitened activation space, applies importance-weighted
 | **Qwen3-14B** (dense) | – | baseline | 79.86 | 78.85 | 67.88 | 82.82 | 60.23 | 96.50 | 43.25 | 77.20 | **73.32** | 1.1E+01 |
 | **Qwen3-8B** (dense) | – | baseline | 77.70 | 74.90 | 64.10 | 80.70 | 56.70 | 95.70 | 40.90 | 73.00 | **70.46** | 1.2E+01 |
 | **ROCKET-Qwen3-8B** | 40% (14B→8B) | training-free | 72.68 | 62.63 | 70.26 | 67.76 | 44.19 | 91.20 | 39.80 | 59.99 | 63.56 | 2.5E+01 |
-| **ROCKET-Qwen3-8B** (healed) ✨ | 40% + 30M tokens | light fine-tune | **78.51** 🏆 | **74.67** 🏆 | 65.55 | **75.29** 🏆 | **53.07** 🏆 | **93.50** 🏆 | 37.89 | **65.23** 🏆 | **67.96** 🏆 | 1.3E+01 🏆 |
 ### Key Takeaways:
 - ✅ **Training-free ROCKET** retains ~90% of the native 8B model's accuracy (63.56 vs 70.46) with **zero fine-tuning**

 | **Qwen3-14B** (dense) | – | baseline | 79.86 | 78.85 | 67.88 | 82.82 | 60.23 | 96.50 | 43.25 | 77.20 | **73.32** | 1.1E+01 |
 | **Qwen3-8B** (dense) | – | baseline | 77.70 | 74.90 | 64.10 | 80.70 | 56.70 | 95.70 | 40.90 | 73.00 | **70.46** | 1.2E+01 |
 | **ROCKET-Qwen3-8B** | 40% (14B→8B) | training-free | 72.68 | 62.63 | 70.26 | 67.76 | 44.19 | 91.20 | 39.80 | 59.99 | 63.56 | 2.5E+01 |
+| **ROCKET-Qwen3-8B** (healed) ✨ | 40% + 30M tokens | light fine-tune | **78.51** 🏆 | 74.67  | **65.55** 🏆 | 75.29  | 53.07 | 93.50 | 37.89 | 65.23 | 67.96  | 1.3E+01 🏆 |
 ### Key Takeaways:
 - ✅ **Training-free ROCKET** retains ~90% of the native 8B model's accuracy (63.56 vs 70.46) with **zero fine-tuning**