CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades

Raeyoung Chang; Dongwook Kwon; Jisoo Lee; Nikhil Verma

2026 ACL ACL 2026

CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades

Abstract

AbstractCascaded LLM systems coordinate models of varying sizes with human experts to balance accuracy, cost, and abstention under uncertainty. However, single-model tiers at each stage falter on ambiguous queries, triggering premature escalations to costlier models or experts due to under-confidence and inefficient compute scaling. **CascadeDebate** addresses this critical gap by inserting multi-agent deliberation directly at each tier’s escalation boundary. Confidence-based routers activate lightweight agent ensembles only for uncertain cases, enabling consensus-driven resolution of ambiguities internally, without invoking higher-cost upgrades. Our unified architecture alternates single-model inference with selective multi-agent deliberation across model scales, culminating in human experts as final fallback. This design scales test-time compute dynamically to query difficulty. Across five benchmarks spanning science, medicine, and general knowledge, CascadeDebate outperforms strong single-model cascades and standalone multi-agent systems by up to 26.75%.An online threshold optimizer proves essential, boosting accuracy 20.98–52.33% relative improvement over fixed policies and enabling elastic adaptation to real-world distributions.

Authors

Raeyoung Chang , Dongwook Kwon , Jisoo Lee , Nikhil Verma

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Decision Making

Keywords

query difficulty large language model multi-agent system test-time compute confidence-based routing cost-aware cascade

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026