tfrere HF Staff commited on
Commit
68c1675
·
1 Parent(s): b612b7f
Files changed (1) hide show
  1. app/src/content/article.mdx +11 -7
app/src/content/article.mdx CHANGED
@@ -289,11 +289,11 @@ import activations_magnitude from './assets/image/activations_magnitude.png'
289
 
290
  As we can see, activation norms roughly grow linearly across layers, with a norm being approximately equal to the layer index.
291
  If we want to look for a steering coefficient that is typically less than the original activation vector norm at layer $l$,
292
- <<<<<<< HEAD
293
- we can define a reduced coefficient $\hat{\alpha}_l = (\alpha_l / l)$, and restrict our search to
294
- $$
295
- \hat{\alpha}_l \in [0,1]
296
- $$
297
 
298
 
299
  ### 3.3 Results of a 1D grid search sweep
@@ -309,7 +309,7 @@ First of all, **for low values of the steering coefficient $\alpha < 5$, the ste
309
  the concept inclusion metric is zero, instruction following and fluency are close to 2.0, equivalent to the reference model.
310
  The surprise under the reference model is similar to the reference model, and there is a minimal amount of repetition.
311
 
312
- As we increase the steering coefficient in the range $5<\alpha<10$, **the concept inclusion metric increases, indicating that the model starts to reference the Eiffel Tower concept in its answers.
313
  However, this comes at the cost of a decrease in instruction following and fluency.**
314
  The decrease of those metrics occurs rather abruptly, indicating that there is a threshold effect.
315
  The log probability under the reference model also starts to decrease, indicating that the model is producing more surprising answers.
@@ -552,7 +552,11 @@ At each step, the hyperparameters of the GP model were optimized by maximizing t
552
  At each step, we select a promising candidate using the `qNoisyExpectedImprovement` acquisition function, which balances exploration and exploitation. This acquisition function is well-suited for noisy functions, as it takes into account the noise in the observations.
553
 
554
  For domain search, as we know that activation magnitude grows roughly linearly with layer index, we expect that the optimal steering coefficient for a feature in layer $l$ should scale with $l$.
555
- We used the reduced parameterization presented earlier, searching for an optimal $\hat{\alpha_l} = \frac{\alpha_l}{l}$ in the range $[0,1]$.
 
 
 
 
556
 
557
  To favor noise reduction at promising locations, every 5 steps we decided to resample the best point found so far.
558
  In that case, by *best* we mean the point with the lowest GP posterior $\mu(x)$. (Note that this is different from the point with the lowest observed value which might be a lucky noisy outlier).
 
289
 
290
  As we can see, activation norms roughly grow linearly across layers, with a norm being approximately equal to the layer index.
291
  If we want to look for a steering coefficient that is typically less than the original activation vector norm at layer $l$,
292
+ we can define a reduced coefficient and restrict our search to:
293
+
294
+ ```math
295
+ \hat{\alpha}_l = \frac{\alpha_l}{l}, \quad \hat{\alpha}_l \in [0,1]
296
+ ```
297
 
298
 
299
  ### 3.3 Results of a 1D grid search sweep
 
309
  the concept inclusion metric is zero, instruction following and fluency are close to 2.0, equivalent to the reference model.
310
  The surprise under the reference model is similar to the reference model, and there is a minimal amount of repetition.
311
 
312
+ As we increase the steering coefficient in the range $5 < \alpha < 10$, **the concept inclusion metric increases, indicating that the model starts to reference the Eiffel Tower concept in its answers.
313
  However, this comes at the cost of a decrease in instruction following and fluency.**
314
  The decrease of those metrics occurs rather abruptly, indicating that there is a threshold effect.
315
  The log probability under the reference model also starts to decrease, indicating that the model is producing more surprising answers.
 
552
  At each step, we select a promising candidate using the `qNoisyExpectedImprovement` acquisition function, which balances exploration and exploitation. This acquisition function is well-suited for noisy functions, as it takes into account the noise in the observations.
553
 
554
  For domain search, as we know that activation magnitude grows roughly linearly with layer index, we expect that the optimal steering coefficient for a feature in layer $l$ should scale with $l$.
555
+ We used the reduced parameterization presented earlier, searching for an optimal value in the range $[0,1]$:
556
+
557
+ ```math
558
+ \hat{\alpha}_l = \frac{\alpha_l}{l}
559
+ ```
560
 
561
  To favor noise reduction at promising locations, every 5 steps we decided to resample the best point found so far.
562
  In that case, by *best* we mean the point with the lowest GP posterior $\mu(x)$. (Note that this is different from the point with the lowest observed value which might be a lucky noisy outlier).