Agentic Oversight via Dialectic Reasoning
Abstract
AbstractDebate has emerged as a promising oversight mechanism for Large Language Models (LLMs) amid rising systemic complexity, particularly where models outperform human evaluators. Yet, Debate provides little verifiable evidence for its final judgments, and its scalability remains largely unexplored. To make oversight grounded and scale as capabilities extend, we introduce an Agentic Oversight framework. By using Dialectic Argumentation as a reasoning function, we extend this paradigm to multilingual and multimodal spaces. We employ a weak-to-strong oversight approach based on two expert models that evaluate and defend contesting answers, while a third blind judge determines the winner using Dialectic Argumentation. Experts argue only for belief-consistent answers, founding the Debate on disagreements. We experimented with six tasks on our framework in both multilingual and multimodal scenarios, and dialectic argumentation consistently outperforms single-expert baselines. Moreover, we show that dialectic judgements from a weaker model deliver argument-mediated supervision that, via fine-tuning, instils unsupervised reasoning signals in expert models.