conftrace_
2026 ACL ACL 2026

Boosting Self-Consistency with Ranking

Abstract

AbstractSelf-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority vote often fails to recover correct answers that are already present among samples. In this work, we reformulate answer selection in self-consistency as a ranking problem. Instead of relying on a single uncertainty or confidence signal, we train a lightweight reranker to score candidate answers using five carefully designed features that capture answer-level frequency, semantic centrality, and reasoning-trace consistency. We instantiate this approach with a LambdaRank model and evaluate it on three datasets under a range of test-time budgets. Across datasets, the proposed method consistently achieves a better accuracy-efficiency trade-off than standard self-consistency and strong baselines, with particularly large gains on question answering benchmarks. Further analysis shows that the proposed features are individually useful and, more importantly, complementary, highlighting the value of learning to combine multiple informative signals for test-time answer selection.