Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues Paper • 2410.10700 • Published Oct 14, 2024 • 3
Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking Paper • 2502.01667 • Published Feb 1, 2025
Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning Paper • 2410.06664 • Published Oct 9, 2024 • 1
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models Paper • 2505.19509 • Published May 26, 2025 • 7
RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents Paper • 2506.00618 • Published May 31, 2025 • 1
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning Paper • 2506.02867 • Published Jun 3, 2025 • 2
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks Paper • 2506.16402 • Published Jun 19, 2025 • 1
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15, 2025 • 64
X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability Paper • 2502.09990 • Published Feb 14, 2025 • 1
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24, 2025 • 10
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report Paper • 2507.16534 • Published Jul 22, 2025 • 9
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28, 2025 • 84
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint Paper • 2502.16770 • Published Feb 24, 2025
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey Paper • 2412.02104 • Published Dec 3, 2024
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published Sep 28, 2025 • 5
Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring Paper • 2502.05242 • Published Feb 7, 2025
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents Paper • 2509.26354 • Published Sep 30, 2025 • 18