Update README.md
Browse files
README.md
CHANGED
|
@@ -34,7 +34,7 @@ The approach operates in whitened activation space, applies importance-weighted
|
|
| 34 |
| **Qwen3-14B** (dense) | β | baseline | 79.86 | 78.85 | 67.88 | 82.82 | 60.23 | 96.50 | 43.25 | 77.20 | **73.32** | 1.1E+01 |
|
| 35 |
| **Qwen3-8B** (dense) | β | baseline | 77.70 | 74.90 | 64.10 | 80.70 | 56.70 | 95.70 | 40.90 | 73.00 | **70.46** | 1.2E+01 |
|
| 36 |
| **ROCKET-Qwen3-8B** | 40% (14Bβ8B) | training-free | 72.68 | 62.63 | 70.26 | 67.76 | 44.19 | 91.20 | 39.80 | 59.99 | 63.56 | 2.5E+01 |
|
| 37 |
-
| **ROCKET-Qwen3-8B** (healed) β¨ | 40% + 30M tokens | light fine-tune | **78.51** π |
|
| 38 |
|
| 39 |
### Key Takeaways:
|
| 40 |
- β
**Training-free ROCKET** retains ~90% of the native 8B model's accuracy (63.56 vs 70.46) with **zero fine-tuning**
|
|
|
|
| 34 |
| **Qwen3-14B** (dense) | β | baseline | 79.86 | 78.85 | 67.88 | 82.82 | 60.23 | 96.50 | 43.25 | 77.20 | **73.32** | 1.1E+01 |
|
| 35 |
| **Qwen3-8B** (dense) | β | baseline | 77.70 | 74.90 | 64.10 | 80.70 | 56.70 | 95.70 | 40.90 | 73.00 | **70.46** | 1.2E+01 |
|
| 36 |
| **ROCKET-Qwen3-8B** | 40% (14Bβ8B) | training-free | 72.68 | 62.63 | 70.26 | 67.76 | 44.19 | 91.20 | 39.80 | 59.99 | 63.56 | 2.5E+01 |
|
| 37 |
+
| **ROCKET-Qwen3-8B** (healed) β¨ | 40% + 30M tokens | light fine-tune | **78.51** π | 74.67 | **65.55** π | 75.29 | 53.07 | 93.50 | 37.89 | 65.23 | 67.96 | 1.3E+01 π |
|
| 38 |
|
| 39 |
### Key Takeaways:
|
| 40 |
- β
**Training-free ROCKET** retains ~90% of the native 8B model's accuracy (63.56 vs 70.46) with **zero fine-tuning**
|