Dongrui Liu's picture

Dongrui Liu

shenqiorient

·

https://shenqildr.github.io/

AI & ML interests

Trustworthy AI

Recent Activity

authored a paper 1 day ago

Trap of Feature Diversity in the Learning of MLPs

authored a paper 1 day ago

MLP Can Be A Good Transformer Learner

authored a paper 1 day ago

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

View all activity

Organizations

authored 20 papers 1 day ago

Trap of Feature Diversity in the Learning of MLPs

Paper • 2112.00980 • Published Dec 2, 2021 • 2

MLP Can Be A Good Transformer Learner

Paper • 2404.05657 • Published Apr 8, 2024 • 1

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Paper • 2410.10700 • Published Oct 14, 2024 • 3

Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking

Paper • 2502.01667 • Published Feb 1, 2025

Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning

Paper • 2410.06664 • Published Oct 9, 2024 • 1

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Paper • 2505.19509 • Published May 26, 2025 • 7

RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents

Paper • 2506.00618 • Published May 31, 2025 • 1

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

Paper • 2506.02867 • Published Jun 3, 2025 • 2

IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks

Paper • 2506.16402 • Published Jun 19, 2025 • 1

The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

Paper • 2507.11097 • Published Jul 15, 2025 • 64

X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability

Paper • 2502.09990 • Published Feb 14, 2025 • 1

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Paper • 2507.18576 • Published Jul 24, 2025 • 10

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report

Paper • 2507.16534 • Published Jul 22, 2025 • 9

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28, 2025 • 84

LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

Paper • 2502.16770 • Published Feb 24, 2025

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

Paper • 2412.02104 • Published Dec 3, 2024

Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models

Paper • 2509.23962 • Published Sep 28, 2025 • 5

Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring

Paper • 2502.05242 • Published Feb 7, 2025

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 80

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

Paper • 2509.26354 • Published Sep 30, 2025 • 18