Shuang Qiu

20 papers · 2019–2026 · 7 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🐝 Cross-Pollinator (12) 🏃 Academic Marathon (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (7) 🌈 Renaissance Researcher (7)

🏃 Academic Marathon (6) 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12) 🏆 Grand Slam 🔥 Unstoppable (7) 💎 Century Club (19) 🚀 Conference Pioneer 🗃️ Keyword Collector (76)

Conferences

ICML (7) AAAI (3) ACL (3) ICLR (3) NIPS (2) CVPR (1) JMLR (1)

Top co-authors

Zhuoran Yang (8) Zhaoran Wang (7) Jieping Ye (5) Xiaohan Wei (4) Tong Zhang (3) Rui Yang (3) Chenjia Bai (3) Lingxiao Wang (2) Feng Luo (2) Mladen Kolar (2)

Keywords

regret bound (4) reinforcement learning (3) upper confidence bound (3) large language model (2) constraint violation (2) markov decision process (2) neural network (2) zero-sum markov game (2) function approximation (2) non-convex optimization (1) self-supervised learning (1) optimal transport (1) preference optimization (1) sample complexity (1) online learning (1) style transfer (1) posterior sampling (1) offline reinforcement learning (1) neural rendering (1) machine reading comprehension (1)

Papers

Self-Reflective Generation at Test Time ACL 2026 Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling ICLR 2025 Forward KL Regularized Preference Optimization for Aligning Diffusion Policies AAAI 2025 Online Preference Alignment for Language Models via Count-based Exploration ICLR 2025 ROPO: Robust Preference Optimization for Large Language Models ICML 2025 Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning ICML 2024 Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment ICML 2024 Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach JMLR 2024 Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards ACL 2024 Gradient-Variation Bound for Online Convex Optimization with Constraints AAAI 2023 Posterior Sampling for Competitive RL: Function Approximation and Partial Observation NIPS 2023 Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics ICLR 2023 Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning ICML 2022 Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions ICML 2021 Stylized Neural Painting CVPR 2021 On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game ICML 2021 Low-Resource Generation of Multi-hop Reasoning Questions ACL 2020 Robust One-Bit Recovery via ReLU Generative Networks: Near-Optimal Statistical Rate and Global Landscape Analysis ICML 2020 Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss NIPS 2020 Which Factorization Machine Modeling Is Better: A Theoretical Answer with Optimal Guarantee AAAI 2019