Spaces:

dlouapre
/

eiffel-tower-llama

Running

App Files Files Community

tfrere HF Staff commited on 22 days ago

Commit

68c1675

1 Parent(s): b612b7f

merge

Browse files

Files changed (1) hide show

app/src/content/article.mdx +11 -7

app/src/content/article.mdx CHANGED Viewed

@@ -289,11 +289,11 @@ import activations_magnitude from './assets/image/activations_magnitude.png'
 As we can see, activation norms roughly grow linearly across layers, with a norm being approximately equal to the layer index.
 If we want to look for a steering coefficient that is typically less than the original activation vector norm at layer $l$,
-<<<<<<< HEAD
-we can define a reduced coefficient $\hat{\alpha}_l = (\alpha_l / l)$, and restrict our search to
-$$
-\hat{\alpha}_l \in [0,1]
-$$
 ### 3.3 Results of a 1D grid search sweep
@@ -309,7 +309,7 @@ First of all, **for low values of the steering coefficient $\alpha < 5$, the ste
 the concept inclusion metric is zero, instruction following and fluency are close to 2.0, equivalent to the reference model.
 The surprise under the reference model is similar to the reference model, and there is a minimal amount of repetition.
-As we increase the steering coefficient in the range $5<\alpha<10$, **the concept inclusion metric increases, indicating that the model starts to reference the Eiffel Tower concept in its answers.
 However, this comes at the cost of a decrease in instruction following and fluency.**
 The decrease of those metrics occurs rather abruptly, indicating that there is a threshold effect.
 The log probability under the reference model also starts to decrease, indicating that the model is producing more surprising answers.
@@ -552,7 +552,11 @@ At each step, the hyperparameters of the GP model were optimized by maximizing t
 At each step, we select a promising candidate using the `qNoisyExpectedImprovement` acquisition function, which balances exploration and exploitation. This acquisition function is well-suited for noisy functions, as it takes into account the noise in the observations.
 For domain search, as we know that activation magnitude grows roughly linearly with layer index, we expect that the optimal steering coefficient for a feature in layer $l$ should scale with $l$.
-We used the reduced parameterization presented earlier, searching for an optimal $\hat{\alpha_l} = \frac{\alpha_l}{l}$ in the range $[0,1]$.
 To favor noise reduction at promising locations, every 5 steps we decided to resample the best point found so far.
 In that case, by *best* we mean the point with the lowest GP posterior $\mu(x)$. (Note that this is different from the point with the lowest observed value which might be a lucky noisy outlier).

 As we can see, activation norms roughly grow linearly across layers, with a norm being approximately equal to the layer index.
 If we want to look for a steering coefficient that is typically less than the original activation vector norm at layer $l$,
+we can define a reduced coefficient and restrict our search to:
+```math
+\hat{\alpha}_l = \frac{\alpha_l}{l}, \quad \hat{\alpha}_l \in [0,1]
+```
 ### 3.3 Results of a 1D grid search sweep
 the concept inclusion metric is zero, instruction following and fluency are close to 2.0, equivalent to the reference model.
 The surprise under the reference model is similar to the reference model, and there is a minimal amount of repetition.
+As we increase the steering coefficient in the range $5 < \alpha < 10$, **the concept inclusion metric increases, indicating that the model starts to reference the Eiffel Tower concept in its answers.
 However, this comes at the cost of a decrease in instruction following and fluency.**
 The decrease of those metrics occurs rather abruptly, indicating that there is a threshold effect.
 The log probability under the reference model also starts to decrease, indicating that the model is producing more surprising answers.
 At each step, we select a promising candidate using the `qNoisyExpectedImprovement` acquisition function, which balances exploration and exploitation. This acquisition function is well-suited for noisy functions, as it takes into account the noise in the observations.
 For domain search, as we know that activation magnitude grows roughly linearly with layer index, we expect that the optimal steering coefficient for a feature in layer $l$ should scale with $l$.
+We used the reduced parameterization presented earlier, searching for an optimal value in the range $[0,1]$:
+```math
+\hat{\alpha}_l = \frac{\alpha_l}{l}
+```
 To favor noise reduction at promising locations, every 5 steps we decided to resample the best point found so far.
 In that case, by *best* we mean the point with the lowest GP posterior $\mu(x)$. (Note that this is different from the point with the lowest observed value which might be a lucky noisy outlier).