conftrace_
2026 ACL ACL 2026

ReActR: Reasoning through Error-Activated Reflection for LLM Post-Training

Abstract

AbstractAlthough Large Language Models (LLMs) have demonstrated substantial proficiency in reasoning, current approaches focus disproportionately on scaling correct training samples, underexploring the value of incorrect reasoning trajectories. Motivated by how humans learn from mistakes, we propose ReActR (Reasoning through Error-Activated Reflection), a framework that enhances reasoning by learning reflective behaviors from erroneous trajectories. Specifically, ReActR comprises data construction and training. First, we synthesize multi-turn erroneous reasoning dataset spanning diverse error types and difficult levels via self-generation and targeted error generation. Second, we enhance the model’s capabilities through Supervised Fine-Tuning (SFT) on synthesized data and then apply Group Relative Policy Optimization (GRPO) with multiple reward signals to further refine reasoning performance. Extensive experiments across five benchmarks and three LLMs demonstrate that ReActR effectively enhances reasoning performance. Notably, on Llama-3-8B, ReActR achieves an average improvement of 3.5% across the five datasets.

Authors