Pattern Classifier

This model was trained to classify which patterns a subject model was trained on, based on neuron activation signatures.

Dataset

Training Dataset: maximuspowers/muat-separate-pca-10
Input Mode: signature
Number of Patterns: 14

Patterns

The model predicts which of the following 14 patterns the subject model was trained on:

palindrome
sorted_ascending
sorted_descending
alternating
contains_abc
starts_with
ends_with
no_repeats
has_majority
increasing_pairs
decreasing_pairs
vowel_consonant
first_last_match
mountain_pattern

Model Architecture

Signature Encoder: [512, 256, 256, 128]
Activation: relu
Dropout: 0.2
Batch Normalization: True

Training Configuration

Optimizer: adam
Learning Rate: 0.001
Batch Size: 16
Loss Function: BCE with Logits (with pos_weight for training, unweighted for validation)

Test Set Performance

F1 Macro: 0.1601
F1 Micro: 0.1771
Hamming Accuracy: 0.8305
Exact Match Accuracy: 0.0843
BCE Loss: 0.5038

Per-Pattern Accuracy (Test Set)

When a model was trained on a pattern, what % of the time does the classifier detect it:

Pattern	Recall (Detection Rate)
palindrome	25.4%
sorted_ascending	24.4%
sorted_descending	28.4%
alternating	36.2%
contains_abc	26.6%
starts_with	13.7%
ends_with	36.8%
no_repeats	22.2%
has_majority	6.5%
increasing_pairs	22.7%
decreasing_pairs	39.1%
vowel_consonant	0.0%
first_last_match	14.0%
mountain_pattern	26.2%

Usage

import torch
from huggingface_hub import hf_hub_download

# Download the model
checkpoint_path = hf_hub_download(repo_id='maximuspowers/muat-separate-pca-10-classifier', filename='best_model.pt')
checkpoint = torch.load(checkpoint_path)

Downloads last month: 23

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train maximuspowers/muat-pca-10-classifier

Collection including maximuspowers/muat-pca-10-classifier

Meta-UAT

Collection

Weight space learning experiments (interpreting behavior through activation signatures) • 16 items • Updated 4 days ago