Analyzing and Internalizing Complex Policy Documents for LLM Agents

Jiateng Liu; Zhenhailong Wang; Xiaojiang Huang; Yingjie Li; Xiang Li; Chenlei Guo; Xing Fan; Ruhi Sarikaya; Heng Ji

2026 ACL ACL 2026

Analyzing and Internalizing Complex Policy Documents for LLM Agents

Abstract

AbstractLarge language model agents rely on in-context policy documents encoding diverse business rules. As businesses scale, these documents grow, creating substantial computational overhead and motivating internalization methods that embed policy into model priors. Prior work focuses on generic prompts, but we find agentic policies span multiple complexity levels and demand heavier reasoning, posing greater challenges. We introduce an agentic benchmark generator with Controllable Complexity in agent policy across four levels, enabling systematic evaluation of agents under increasing complexity and providing a testbed for policy internalization. Our analysis shows that workflow-governing policy specifications are the hardest to reason over, and that SFT on gold trajectories with chain-of-thought is data-hungry and struggles at high complexity. We propose Category-Aware Policy Continued Pretraining, an automated pipeline that analyzes policies, extracts key specifications, categorizes them into factual, behavioral, and conditional types, and isolates those driving workflow complexity. This enables targeted “therapy” by synthesizing specialized training data for each type and improving internalization via an autoregressive pretraining loss. Extensive experiments show our synthetic data and objective consistently improve performance. Combined with SFT, our method outperforms the baseline across different settings, especially in data-sparse and high-complexity regimes, with gains up to 41% and 22% on Qwen-3-32B. Overall, we achieve 97.3% prompt reduction on our benchmark, and on 𝜏-Bench we further improve performance while reducing prompt requirements with very limited SFT data.

Authors

Jiateng Liu , Zhenhailong Wang , Xiaojiang Huang , Yingjie Li , Xiang Li , Chenlei Guo , Xing Fan , Ruhi Sarikaya , Heng Ji

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Reasoning

Keywords

continued pretraining supervised fine-tuning agent system large language model policy internalization

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026