arxiv:2601.13288

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Published on Jan 19

· Submitted by

Luciano Del Corro on Jan 21

Universidad de San Andrés

Upvote

Authors:

Luciano Del Corro

Abstract

Lightweight probes trained on hidden states of LLMs enable efficient classification tasks without additional computational overhead, improving safety and sentiment analysis performance.

AI-generated summary

Production LLM systems often rely on separate models for safety and other classification-heavy steps, increasing latency, VRAM footprint, and operational complexity. We instead reuse computation already paid for by the serving LLM: we train lightweight probes on its hidden states and predict labels in the same forward pass used for generation. We frame classification as representation selection over the full token-layer hidden-state tensor, rather than committing to a fixed token or fixed layer (e.g., first-token logits or final-layer pooling). To implement this, we introduce a two-stage aggregator that (i) summarizes tokens within each layer and (ii) aggregates across layer summaries to form a single representation for classification. We instantiate this template with direct pooling, a 100K-parameter scoring-attention gate, and a downcast multi-head self-attention (MHA) probe with up to 35M trainable parameters. Across safety and sentiment benchmarks our probes improve over logit-only reuse (e.g., MULI) and are competitive with substantially larger task-specific baselines, while preserving near-serving latency and avoiding the VRAM and latency costs of a separate guard-model pipeline.

View arXiv page View PDF Add to collection

Community

lucianodelcorro

Paper author Paper submitter about 18 hours ago

Rather than adding another model to the stack, this work reuses computation already paid for in the serving LLM’s forward pass by training compact probes on hidden states. It frames the problem as principled selection across tokens and layers (not just “final layer” or “first token”), implemented with a two-stage aggregation template and lightweight variants that stay close to serving-time cost.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.13288 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.13288 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.13288 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.