Spaces:
Running
Running
merge
Browse files- app/src/content/article.mdx +11 -7
app/src/content/article.mdx
CHANGED
|
@@ -289,11 +289,11 @@ import activations_magnitude from './assets/image/activations_magnitude.png'
|
|
| 289 |
|
| 290 |
As we can see, activation norms roughly grow linearly across layers, with a norm being approximately equal to the layer index.
|
| 291 |
If we want to look for a steering coefficient that is typically less than the original activation vector norm at layer $l$,
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
\hat{\alpha}_l \in [0,1]
|
| 296 |
-
|
| 297 |
|
| 298 |
|
| 299 |
### 3.3 Results of a 1D grid search sweep
|
|
@@ -309,7 +309,7 @@ First of all, **for low values of the steering coefficient $\alpha < 5$, the ste
|
|
| 309 |
the concept inclusion metric is zero, instruction following and fluency are close to 2.0, equivalent to the reference model.
|
| 310 |
The surprise under the reference model is similar to the reference model, and there is a minimal amount of repetition.
|
| 311 |
|
| 312 |
-
As we increase the steering coefficient in the range $5
|
| 313 |
However, this comes at the cost of a decrease in instruction following and fluency.**
|
| 314 |
The decrease of those metrics occurs rather abruptly, indicating that there is a threshold effect.
|
| 315 |
The log probability under the reference model also starts to decrease, indicating that the model is producing more surprising answers.
|
|
@@ -552,7 +552,11 @@ At each step, the hyperparameters of the GP model were optimized by maximizing t
|
|
| 552 |
At each step, we select a promising candidate using the `qNoisyExpectedImprovement` acquisition function, which balances exploration and exploitation. This acquisition function is well-suited for noisy functions, as it takes into account the noise in the observations.
|
| 553 |
|
| 554 |
For domain search, as we know that activation magnitude grows roughly linearly with layer index, we expect that the optimal steering coefficient for a feature in layer $l$ should scale with $l$.
|
| 555 |
-
We used the reduced parameterization presented earlier, searching for an optimal
|
|
|
|
|
|
|
|
|
|
|
|
|
| 556 |
|
| 557 |
To favor noise reduction at promising locations, every 5 steps we decided to resample the best point found so far.
|
| 558 |
In that case, by *best* we mean the point with the lowest GP posterior $\mu(x)$. (Note that this is different from the point with the lowest observed value which might be a lucky noisy outlier).
|
|
|
|
| 289 |
|
| 290 |
As we can see, activation norms roughly grow linearly across layers, with a norm being approximately equal to the layer index.
|
| 291 |
If we want to look for a steering coefficient that is typically less than the original activation vector norm at layer $l$,
|
| 292 |
+
we can define a reduced coefficient and restrict our search to:
|
| 293 |
+
|
| 294 |
+
```math
|
| 295 |
+
\hat{\alpha}_l = \frac{\alpha_l}{l}, \quad \hat{\alpha}_l \in [0,1]
|
| 296 |
+
```
|
| 297 |
|
| 298 |
|
| 299 |
### 3.3 Results of a 1D grid search sweep
|
|
|
|
| 309 |
the concept inclusion metric is zero, instruction following and fluency are close to 2.0, equivalent to the reference model.
|
| 310 |
The surprise under the reference model is similar to the reference model, and there is a minimal amount of repetition.
|
| 311 |
|
| 312 |
+
As we increase the steering coefficient in the range $5 < \alpha < 10$, **the concept inclusion metric increases, indicating that the model starts to reference the Eiffel Tower concept in its answers.
|
| 313 |
However, this comes at the cost of a decrease in instruction following and fluency.**
|
| 314 |
The decrease of those metrics occurs rather abruptly, indicating that there is a threshold effect.
|
| 315 |
The log probability under the reference model also starts to decrease, indicating that the model is producing more surprising answers.
|
|
|
|
| 552 |
At each step, we select a promising candidate using the `qNoisyExpectedImprovement` acquisition function, which balances exploration and exploitation. This acquisition function is well-suited for noisy functions, as it takes into account the noise in the observations.
|
| 553 |
|
| 554 |
For domain search, as we know that activation magnitude grows roughly linearly with layer index, we expect that the optimal steering coefficient for a feature in layer $l$ should scale with $l$.
|
| 555 |
+
We used the reduced parameterization presented earlier, searching for an optimal value in the range $[0,1]$:
|
| 556 |
+
|
| 557 |
+
```math
|
| 558 |
+
\hat{\alpha}_l = \frac{\alpha_l}{l}
|
| 559 |
+
```
|
| 560 |
|
| 561 |
To favor noise reduction at promising locations, every 5 steps we decided to resample the best point found so far.
|
| 562 |
In that case, by *best* we mean the point with the lowest GP posterior $\mu(x)$. (Note that this is different from the point with the lowest observed value which might be a lucky noisy outlier).
|