conftrace_
2026 ACL ACL 2026

Reliable Use of Lemmas via Eligibility Reasoning and Section-Aware Reinforcement Learning

Abstract

AbstractRecent large language models (LLMs) perform strongly on mathematical benchmarks yet often misapply lemmas, importing conclusions without validating assumptions. We formalize lemma-judging as a structured prediction task: given a statement and a candidate lemma, the model must output a precondition check and a conclusion-utility check, from which a usefulness decision is derived. We present RULES, which encodes this specification via a two-section output and trains with reinforcement learning plus section-aware loss masking to assign penalty to the section responsible for errors. Training and evaluation draw on diverse natural-language and formal proof corpora; robustness is assessed with a held-out perturbation suite; and end-to-end evaluation spans competition-style, perturbation-aligned, and theorem-based problems across various LLMs. Results show consistent in-domain gains over both a vanilla model and a single-label RL baseline, larger improvements on applicability-breaking perturbations, and parity or modest gains on end-to-end tasks; ablations indicate that the two-section outputs and section-aware reinforcement are both necessary for robustness.