Think Faster Than Words: Efficient LLM Chain-of-Thought Reasoning via Dynamic Shortcut Decoding

Fan LIU; Yanhao Wang; Min Zhang; Zhikang Chen; Zeyuan Li; Lewei He; Jiahui Pan

2026 ACL ACL 2026

Think Faster Than Words: Efficient LLM Chain-of-Thought Reasoning via Dynamic Shortcut Decoding

Abstract

AbstractThis paper proposes shortcut decoding, an efficient framework for accelerating Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs). Existing methods that prune or employ early stopping to reduce latency often compromise reasoning reliability. Motivated by the observation that LLMs frequently converge to correct solutions internally before completing explicit textual reasoning, we propose a dual-signal adaptive controller that integrates lightweight probes over internal hidden states with step-level entropy. This controller detects convergence of reasoning during generation and adaptively selects between a fast-exit path and a stability-verified path to remove redundant steps while preserving answer correctness. Experiments across multiple mathematical reasoning benchmarks demonstrate that shortcut decoding reduces token usage by approximately 35%, maintains accuracy comparable to full CoT decoding, and achieves final-answer accuracy comparable to the full CoT baseline, outperforming existing early-stopping methods without updating the base model. Our code is available at https://github.com/kuromi9527/shortcut_decoding.

Authors

Fan LIU , Yanhao Wang , Min Zhang , Zhikang Chen , Zeyuan Li , Lewei He , Jiahui Pan

Topics

Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Efficient Computing Deep Learning > Learning Types > Chain-of-Thought Reasoning

Keywords

chain-of-thought reasoning adaptive controller shortcut decoding

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026