Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets Paper • 2506.05346 • Published Jun 5, 2025
Spectral Insights into Data-Oblivious Critical Layers in Large Language Models Paper • 2506.00382 • Published May 31, 2025
NCTV: Neural Clamping Toolkit and Visualization for Neural Network Calibration Paper • 2211.16274 • Published Nov 29, 2022
Running 3 NCTV: Neural Clamping Toolkit and Visualization 🦀 3 Model-agnostic Toolkit for Neural Network Calibration
allenai/llama-3.1-tulu-3-8b-preference-mixture Viewer • Updated Feb 4, 2025 • 273k • 1.93k • 26