KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

Baochang Ren; Shuofei Qiao; Ningyu Zhang; Da Zheng; Huajun Chen

2026 ACL ACL 2026

KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

Abstract

AbstractSlow-thinking Large Language Models (LLMs) have demonstrated strong reasoning capabilities but often suffer from severe hallucinations due to an inability to recognize their knowledge boundaries. Existing Reinforcement Learning (RL) approaches typically rely on outcome-oriented rewards, which can inadvertently reinforce fabricated reasoning paths when the final answer is correct. To address this, we propose **Know**ledge-enhanced **RL**, **KnowRL**, a framework that integrates factual supervision directly into the reasoning process. By decomposing the chain of thought into atomic facts and verifying them against the corresponding ground-truth knowledge, KnowRL performs fine-grained checks to encourage models to reason faithfully. Crucially, this process-oriented supervision teaches the model to identify its knowledge boundaries, learning to say "I don’t know" instead of fabricating answers when information is missing. Experimental results demonstrate that KnowRL effectively mitigates hallucinations—reducing the Incorrect Rate on SimpleQA by 20.3% for distillation-based slow-thinking models while maintaining strong performance on complex reasoning benchmarks like GPQA and AIME 2025. Furthermore, our method shows robust transferability to out-of-distribution tasks, indicating that the model learns a generalizable verification behavior.

Authors

Baochang Ren , Shuofei Qiao , Ningyu Zhang , Da Zheng , Huajun Chen

Topics

Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Reinforcement Learning Deep Learning > Learning Types > Reasoning

Keywords

reinforcement learning hallucination mitigation knowledge boundary factual reasoning chain-of-thought decomposition

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026