Ada-RS: Adaptive Rejection Sampling for Selective Thinking

Yirou Ge; Yixi Li; Alec M. Chiu; Shivani Shekhar; Zijie Pan; Avinash Thangali; Yun-Shiuan Chuang; Chaitanya Kulkarni; Uma Kona; Linsey Pang; Prakhar Mehrotra

2026 ACL ACL 2026

Ada-RS: Adaptive Rejection Sampling for Selective Thinking

Abstract

AbstractLarge language models (LLMs) are increasingly being deployed in cost- and latency-sensitive settings. While chain-of-thought improves reasoning, it can waste tokens on simple requests. We study selective thinking for tool-using LLMs and introduce Adaptive Rejection Sampling (Ada-RS), an algorithm-agnostic sample filtering framework for learning selective and efficient reasoning. For each given context, Ada-RS scores multiple sampled completions with an adaptive length-penalized reward then applies stochastic rejection sampling to retain only high-reward candidates (or preference pairs) for downstream optimization. We demonstrate how Ada-RS plugs into both preference pair (e.g. DPO) or grouped policy optimization strategies (e.g. DAPO). Using Qwen3-8B with LoRA on a synthetic tool call-oriented e-commerce benchmark, Ada-RS improves the accuracy-efficiency frontier over standard algorithms by reducing average output tokens by up to ∼80% and reducing thinking rate by up to ∼95% while maintaining or improving tool call accuracy. We further demonstrate that these gains generalize across model scales (Qwen3-1.7B, 8B, 14B) and domains (τ 2-Bench airline and telecom). These results highlight that training signal selection is a powerful lever for efficient reasoning in latency-sensitive deployments.

Authors

Yirou Ge , Yixi Li , Alec M. Chiu , Shivani Shekhar , Zijie Pan , Avinash Thangali , Yun-Shiuan Chuang , Chaitanya Kulkarni , Uma Kona , Linsey Pang , Prakhar Mehrotra

Topics

Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Reasoning Artificial Intelligence > Core AI > Efficient Computing

Keywords

efficient reasoning selective thinking adaptive rejection sampling tool call accuracy

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026