GRAD: Generalizing RAG Adaptation with Decoding

Youngwon Lee; Seung-won Hwang; Zhewei Yao; Yuxiong He

2026 ACL ACL 2026

GRAD: Generalizing RAG Adaptation with Decoding

Abstract

AbstractRetrieval-augmented generation needs generation to follow retrieved evidence across shifting domains and prompt layouts, but training a new stronger model per task is costly. To this end, we propose GRAD, an adaptive decoding-time framework that keeps the base generator fixed and composes small, objective-specific guidance at inference. A key advantage of this design is enabling mix and match diverse RAG objectives: model scaling (MS), domain adaptation (DA) and positional debiasing (DB) can be integrated as token-level guidance terms, and new objectives can be easily plugged in. Across public benchmarks and private settings with no in-domain labels, GRAD improves accuracy with favorable latency, offering strong trade-offs versus scaling while reliably activating helpful objectives and suppressing harmful ones, adaptively to tasks.

Authors

Youngwon Lee , Seung-won Hwang , Zhewei Yao , Yuxiong He

Topics

Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Domain Adaptation Artificial Intelligence > Core AI > Retrieval-Augmented Generation

Keywords

domain adaptation retrieval-augmented generation decoding-time framework

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026