Zeyu Zheng

21 papers · 2013–2026 · 5 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🏃 Academic Marathon (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (5) 🐝 Cross-Pollinator (13)

🏃 Academic Marathon (12) 🧭 Keyword Pioneer 🏆 Keyword Champion (2) 🗃️ Keyword Collector (81) 💎 Century Club (20) 🔥 Unstoppable (6) ❓ The Questioner

Conferences

NIPS (9) ICML (7) ACL (2) AISTATS (2) AAAI (1)

Top co-authors

Feng Zhu (4) David Simchi-Levi (4) Satinder Singh (3) Bernardo Avila Pires (3) Junhyuk Oh (2) Yunhao Tang (2) Tianyi Lin (2) Michal Valko (2) Clare Lyle (2) Rémi Munos (2)

Research topics

Reinforcement Learning (1)

Keywords

regret bound (6) multi-armed bandit (4) reinforcement learning (3) online learning (3) intrinsic reward (2) neural network (2) worst-case optimality (2) deep reinforcement learning (2) expected regret (2) stochastic optimization (2) non-stationary environment (2) optimal transport (1) continual learning (1) self-supervised learning (1) uncertainty quantification (1) policy gradient (1) causal inference (1) exploration exploitation (1) model misspecification (1) gaussian process (1)

Papers

A Comprehensive Survey of Process Reward Models: Data Generation, Model Construction, and Usage ACL 2026 Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation ICML 2025 Generalized Preference Optimization: A Unified Approach to Offline Alignment ICML 2024 Normalization and effective learning rates in reinforcement learning NIPS 2024 Stochastic Multi-Armed Bandits with Strongly Reward-Dependent Delays AISTATS 2024 Human Alignment of Large Language Models through Online Preference Optimisation ICML 2024 Non-stationary Experimental Design under Linear Trends NIPS 2023 Stochastic Multi-armed Bandits: Optimal Trade-off among Optimality, Consistency, and Tail Risk NIPS 2023 Understanding Plasticity in Neural Networks ICML 2023 Contextual Gaussian Process Bandits with Neural Networks NIPS 2023 A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk NIPS 2022 Adaptive Pairwise Weights for Temporal Credit Assignment AAAI 2022 Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization NIPS 2022 Dynamic Planning and Learning under Recovering Rewards ICML 2021 Learning State Representations from Random Deep Action-conditional Predictions NIPS 2021 On Projection Robust Optimal Transport: Sample Complexity and Model Misspecification AISTATS 2021 Stochastic $L^\natural$-convex Function Minimization NIPS 2021 When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment ICML 2020 What Can Learned Intrinsic Rewards Capture? ICML 2020 On Learning Intrinsic Rewards for Policy Gradient Methods NIPS 2018 Extracting Events with Informal Temporal References in Personal Histories in Online Communities ACL 2013