All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Yuechen Jiang; Zhiwei Liu; Yupeng Cao; Yueru He; Ziyang Xu; Chen Xu; Zhiyang Deng; Prayag Tiwari; Xi Chen; Alejandro Lopez-Lira; Jimin Huang; Junichi Tsujii; Sophia Ananiadou

2026 ACL ACL 2026

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Abstract

AbstractWe introduce RFC-Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC-Bench operates at the paragraph level and captures the contextual complexity of financial news where meaning emerges from dispersed cues. The benchmark defines two complementary tasks: reference-free misinformation detection and comparison-based diagnosis using paired original–perturbed inputs. Experiments reveal a consistent pattern: performance is substantially stronger when comparative context is available, while reference-free settings expose significant weaknesses, including unstable predictions and elevated invalid outputs. These results indicate that current models struggle to maintain coherent belief states without external grounding. By highlighting this gap, RFC-Bench provides a structured testbed for studying reference-free reasoning and advancing more reliable financial misinformation detection in real-world settings.

Authors

Yuechen Jiang , Zhiwei Liu , Yupeng Cao , Yueru He , Ziyang Xu , Chen Xu , Zhiyang Deng , Prayag Tiwari , Xi Chen , Alejandro Lopez-Lira , Jimin Huang , Junichi Tsujii , Sophia Ananiadou

Topics

Natural Language Processing > Applications > Fact-Checking Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Evaluation

Keywords

misinformation detection financial news large language model reference-free detection

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026