Meta-UAT
Collection
Weight space learning experiments (interpreting behavior through activation signatures)
•
16 items
•
Updated
This model was trained to classify which patterns a subject model was trained on, based on neuron activation signatures.
The model predicts which of the following 14 patterns the subject model was trained on:
palindromesorted_ascendingsorted_descendingalternatingcontains_abcstarts_withends_withno_repeatshas_majorityincreasing_pairsdecreasing_pairsvowel_consonantfirst_last_matchmountain_pattern| Pattern | Precision | Recall | F1 Score |
|---|---|---|---|
| palindrome | 14.1% | 90.0% | 24.4% |
| sorted_ascending | 49.1% | 62.1% | 54.8% |
| sorted_descending | 12.9% | 89.7% | 22.5% |
| alternating | 18.6% | 69.9% | 29.4% |
| contains_abc | 26.4% | 73.7% | 38.8% |
| starts_with | 9.6% | 84.7% | 17.3% |
| ends_with | 15.1% | 82.2% | 25.5% |
| no_repeats | 12.7% | 59.3% | 21.0% |
| has_majority | 56.2% | 34.6% | 42.9% |
| increasing_pairs | 27.6% | 66.7% | 39.0% |
| decreasing_pairs | 16.5% | 80.8% | 27.4% |
| vowel_consonant | 12.1% | 50.0% | 19.5% |
| first_last_match | 21.1% | 78.1% | 33.2% |
| mountain_pattern | 13.8% | 54.3% | 22.0% |
import torch
from huggingface_hub import hf_hub_download
# Download the model
checkpoint_path = hf_hub_download(repo_id='maximuspowers/muat-fourier-3-classifier', filename='best_model.pt')
checkpoint = torch.load(checkpoint_path)