Learning from Cognition: Enhancing RL Efficiency for LLM Reasoning via Hierarchical Metacognitive Decomposition and Refinement

Zexu Sun; Yongcheng Zeng; Erxue Min; Heyang Gao; Bokai Ji; Dugang Liu; Xing Tang; Xiuqiang He; Xu Chen

2026 ACL ACL 2026

Learning from Cognition: Enhancing RL Efficiency for LLM Reasoning via Hierarchical Metacognitive Decomposition and Refinement

Abstract

AbstractContemporary progress in Large Language Models (LLMs) has revealed notable inferential capacities via reinforcement learning (RL) employing verifiable rewards. However, “zero-RL” approaches relying on fixed prompt templates introduce substantial sampling inefficiencies for weak LLMs, as most problems generate invalid outputs during accuracy-driven filtration. To solve this, we propose Cog-Rethinker, a novel hierarchical metacognitive RL framework. Cog-Rethinker enhances the rollout procedure by improving sample utilization through a two-stage framework leveraging human cognition. First, it prompts the policy to decompose zero-accuracy problems into subproblems. Second, it prompts the policy to refine answers by referencing previous wrong solutions. Moreover, to enable cold-starts and maintain train-test consistency, Cog-Rethinker applies supervised fine-tuning using correct samples from these stages. Experimental results demonstrate Cog-Rethinker’s superior performance on mathematical reasoning benchmarks and its improved sample efficiency that accelerates convergence compared to baselines.

Authors

Zexu Sun , Yongcheng Zeng , Erxue Min , Heyang Gao , Bokai Ji , Dugang Liu , Xing Tang , Xiuqiang He , Xu Chen

Topics

Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Reasoning Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning sample efficiency mathematical reasoning hierarchical decomposition large language model

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026