arxiv:2602.17022

ReIn: Conversational Error Recovery with Reasoning Inception

Published on Feb 19

· Submitted by

Takyoung Kim on Feb 23

University of Illinois at Urbana-Champaign

Upvote

Authors:

Abstract

Conversational agents with tool integration face challenges from user-induced errors, but a test-time intervention method called Reasoning Inception (ReIn) enables error recovery by injecting external reasoning into the agent's decision-making process without modifying model parameters or prompts.

AI-generated summary

Conversational agents powered by large language models (LLMs) with tool integration achieve strong performance on fixed task-oriented dialogue datasets but remain vulnerable to unanticipated, user-induced errors. Rather than focusing on error prevention, this work focuses on error recovery, which necessitates the accurate diagnosis of erroneous dialogue contexts and execution of proper recovery plans. Under realistic constraints precluding model fine-tuning or prompt modification due to significant cost and time requirements, we explore whether agents can recover from contextually flawed interactions and how their behavior can be adapted without altering model parameters and prompts. To this end, we propose Reasoning Inception (ReIn), a test-time intervention method that plants an initial reasoning into the agent's decision-making process. Specifically, an external inception module identifies predefined errors within the dialogue context and generates recovery plans, which are subsequently integrated into the agent's internal reasoning process to guide corrective actions, without modifying its parameters or system prompts. We evaluate ReIn by systematically simulating conversational failure scenarios that directly hinder successful completion of user goals: user's ambiguous and unsupported requests. Across diverse combinations of agent models and inception modules, ReIn substantially improves task success and generalizes to unseen error types. Moreover, it consistently outperforms explicit prompt-modification approaches, underscoring its utility as an efficient, on-the-fly method. In-depth analysis of its operational mechanism, particularly in relation to instruction hierarchy, indicates that jointly defining recovery tools with ReIn can serve as a safe and effective strategy for improving the resilience of conversational agents without modifying the backbone models or system prompts.

View arXiv page View PDF Add to collection

Community

takyoung

Paper submitter about 17 hours ago

Here is a brief summary:

Constraint: "You can't touch the agent."

We explicitly assume deployment settings where changing model parameters or the system prompt is not allowed (policy, safety validation, product constraints, etc.). So the question becomes: how do you improve reliability without retraining or prompt surgery?

ReIn's surprising lever: tool definition can override instruction hierarchy.

ReIn is a lightweight test-time recovery layer: an external module identifies the failure type and injects a "reasoning inception" plan through tool definition. Empirically, we observe that this can override conventional instruction hierarchy, letting the agent recover even when the original system instructions would have locked it into a bad trajectory.

If you're building tool-using agents in production and need a knob that's compatible with "frozen prompts + frozen weights," ReIn is designed for exactly that.