ARCHITECT: Uncertainty-Aware Dynamic Tool Learning via Causal Intervention for Open-World Agents

Zhangyi Wang; Jiexiang Xu; Bingnan Yu; Zongze Li

2026 ACL ACL 2026

ARCHITECT: Uncertainty-Aware Dynamic Tool Learning via Causal Intervention for Open-World Agents

Abstract

AbstractDynamic tool generation empowers Large Language Model (LLM) agents to synthesize tools on demand, yet a critical challenge remains: 32.4% of generated tools fail on first invocation. We present Causal Tool Diagnosis (CTD), a principled framework that moves beyond black-box reliability prediction to interpretable failure attribution. CTD constructs a Structural Causal Model (SCM) capturing how specification quality, code characteristics, and execution environment jointly determine tool outcomes. Uniquely leveraging code’s intervenability, we conduct controlled sandbox experiments to estimate causal effects—an advantage unavailable in pure text generation. CTD jointly predicts confidence (Spearman rank correlation coefficient 𝜌=0.90) and root cause attribution (78% accuracy), with attributions directly guiding targeted repairs (+9.6% success rate over error-type classification). Our ARCHITECT framework, integrating CTD throughout the tool lifecycle, achieves state-of-the-art on four benchmarks including StableToolBench (+3.8%), MINT (+4.6%), T-Eval (+3.7%), and SWE-bench Lite (+2.4%), with consistent improvements across all settings.

Authors

Zhangyi Wang , Jiexiang Xu , Bingnan Yu , Zongze Li

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Causal Inference Artificial Intelligence > Core AI > Large Language Models

Keywords

structural causal model causal intervention open-world agent dynamic tool learning root cause attribution

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026