Reliable Use of Lemmas via Eligibility Reasoning and Section-Aware Reinforcement Learning

Zhikun Xu; Xiaodong Yu; Ben Zhou; Jiang Liu; Jialian Wu; Ze Wang; Ximeng Sun; Hao Chen; Zicheng Liu

2026 ACL ACL 2026

Reliable Use of Lemmas via Eligibility Reasoning and Section-Aware Reinforcement Learning

Abstract

AbstractRecent large language models (LLMs) perform strongly on mathematical benchmarks yet often misapply lemmas, importing conclusions without validating assumptions. We formalize lemma-judging as a structured prediction task: given a statement and a candidate lemma, the model must output a precondition check and a conclusion-utility check, from which a usefulness decision is derived. We present RULES, which encodes this specification via a two-section output and trains with reinforcement learning plus section-aware loss masking to assign penalty to the section responsible for errors. Training and evaluation draw on diverse natural-language and formal proof corpora; robustness is assessed with a held-out perturbation suite; and end-to-end evaluation spans competition-style, perturbation-aligned, and theorem-based problems across various LLMs. Results show consistent in-domain gains over both a vanilla model and a single-label RL baseline, larger improvements on applicability-breaking perturbations, and parity or modest gains on end-to-end tasks; ablations indicate that the two-section outputs and section-aware reinforcement are both necessary for robustness.

Authors

Zhikun Xu , Xiaodong Yu , Ben Zhou , Jiang Liu , Jialian Wu , Ze Wang , Ximeng Sun , Hao Chen , Zicheng Liu

Topics

Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Reasoning Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

reinforcement learning structured prediction mathematical reasoning lemma verification section-aware loss masking

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026