When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift Paper • 2602.14161 • Published 10 days ago