Think Better, Not Longer: Token-Level Marginal Utility for Efficient Reasoning in Large Reasoning Models

Jiawei Li; Yang Gao; Huashan Sun; Chong Feng

2026 ACL ACL 2026

Think Better, Not Longer: Token-Level Marginal Utility for Efficient Reasoning in Large Reasoning Models

Abstract

AbstractWhile Large Reasoning Models (LRMs) have demonstrated remarkable capabilities through explicit Chain-of-Thought (CoT) generation, they frequently suffer from “overthinking”. In this work, we bridge this gap by introducing Token-level Marginal Utility, which quantifies the per-token log-probability gain of the ground-truth answer. Leveraging this dense supervision signal, we propose MUTO (Marginal Utility Guided Thinking Optimization), a unified training framework designed to synthesize concise reasoning chains. Rather than relying only on coarse trajectory-level length control, MUTO identifies tokens that reduce the model’s likelihood of the correct answer and penalizes such negative-utility reasoning, yielding concise yet effective CoT trajectories. Experiments on DeepSeek-R1-Distill-Qwen backbones (1.5B and 7B) across six math reasoning benchmarks show that MUTO yields a markedly better efficiency-accuracy Pareto frontier. It reduces average token usage by 87.1% at 1.5B while improving accuracy by 2.3%, and cuts tokens by 80.2% at 7B with only -0.1% accuracy change, achieving the best length-normalized accuracy among baselines.

Authors

Jiawei Li , Yang Gao , Huashan Sun , Chong Feng

Topics

Artificial Intelligence > Core AI > Reasoning Artificial Intelligence > Core AI > Efficient Computing Deep Learning > Learning Types > Chain-of-Thought Reasoning

Keywords

chain-of-thought reasoning large reasoning model efficient reasoning marginal utility

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026