Haipeng Luo

88 papers · 2014–2025 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (17) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (9)

🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (17) 🧭 Keyword Pioneer 🏠 Conference Loyalist (33) 🏆 Keyword Champion (32) 👑 Triple Crown 🧬 Topic Evolution 🔬 Deep Specialist (36) 🤝 Dynamic Duo (24) 🔥 Unstoppable (12) ⚡ Prolific Year (11) ❓ The Questioner (2) 💎 Century Club (88) 🗃️ Keyword Collector (66)

Conferences

NIPS (33) COLT (22) ICML (18) AISTATS (5) ICLR (3) ALT (2) CVPR (2) UAI (2) IJCAI (1)

Top co-authors

Chen-Yu Wei (24) Mengxiao Zhang (18) Chung-Wei Lee (15) Liyu Chen (10) Christian Kroer (7) Rahul Jain (7) Alekh Agarwal (7) Tiancheng Jin (6) Gabriele Farina (6) Satyen Kale (5)

Keywords

regret bound (48) online learning (32) contextual bandit (14) multi-armed bandit (13) markov decision process (8) game theory (8) bandit feedback (7) online mirror descent (7) stochastic shortest path (7) dynamic regret (7) adversarial learning (6) multi-agent system (6) stochastic optimization (5) nash equilibrium (5) adversarial bandit (5) reinforcement learning (5) online algorithm (4) no-regret learning (4) linear bandit (4) policy optimization (4)

Papers

Corrupted Learning Dynamics in Games COLT 2025 Contextual Linear Bandits with Delay as Payoff ICML 2025 Alternating Regret for Online Convex Optimization COLT 2025 Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback COLT 2025 Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games ICLR 2025 WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct ICLR 2025 Provably Efficient Interactive-Grounded Learning with Personalized Reward NIPS 2024 WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena NIPS 2024 Efficient Contextual Bandits with Uninformed Feedback Graphs ICML 2024 Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback ICML 2024 No-Regret Learning for Fair Multi-Agent Social Welfare Optimization NIPS 2024 Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms NIPS 2024 Contextual Multinomial Logit Bandits with General Value Functions NIPS 2024 Optimal Multiclass U-Calibration Error and Beyond NIPS 2024 Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games AISTATS 2024 Online Learning in Contextual Second-Price Pay-Per-Click Auctions AISTATS 2024 ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints ICML 2024 On Tractable $\Phi$-Equilibria in Non-Concave Games NIPS 2024 Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? CVPR 2023 Posterior sampling-based online learning for the stochastic shortest path model UAI 2023 Practical Contextual Bandits with Feedback Graphs NIPS 2023 Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms NIPS 2023 Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback NIPS 2023 No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions NIPS 2023 Regret Matching+: (In)Stability and Fast Convergence in Games NIPS 2023 No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution AISTATS 2023 Refined Regret for Adversarial MDPs with Linear Function Approximation ICML 2023 Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs ALT 2023 Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models CVPR 2023 Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games NIPS 2022 Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP ICML 2022 Learning Infinite-horizon Average-reward Markov Decision Process with Constraints ICML 2022 Near-Optimal No-Regret Learning Dynamics for General Convex Games NIPS 2022 Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games ICML 2022 No-Regret Learning in Time-Varying Zero-Sum Games ICML 2022 Policy Optimization for Stochastic Shortest Path COLT 2022 Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits COLT 2022 Adaptive Bandit Convex Optimization with Heterogeneous Curvature COLT 2022 Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments NIPS 2022 Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback NIPS 2022 Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback NIPS 2022 Active Online Learning with Hidden Shifting Domains AISTATS 2021 Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path NIPS 2021 Last-iterate Convergence in Extensive-Form Games NIPS 2021 The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition NIPS 2021 Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses NIPS 2021 Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation AISTATS 2021 Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds ALT 2021 Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition COLT 2021 Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications COLT 2021 Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games COLT 2021 Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach COLT 2021 Linear Last-iterate Convergence in Constrained Saddle-point Optimization ICLR 2021 Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case ICML 2021 Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously ICML 2021 Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs NIPS 2020 Open Problem: Model Selection for Contextual Bandits COLT 2020 Comparator-Adaptive Convex Bandits NIPS 2020 Taking a hint: How to leverage loss predictors in contextual bandits? COLT 2020 A Closer Look at Small-loss Bounds for Bandits with Graph Feedback COLT 2020 Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition ICML 2020 Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes ICML 2020 Fair Contextual Multi-Armed Bandits: Theory and Experiments UAI 2020 Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition NIPS 2020 Improved Path-length Regret Bounds for Bandits COLT 2019 Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously ICML 2019 Model Selection for Contextual Bandits NIPS 2019 A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal and Parameter-free COLT 2019 Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information COLT 2019 Equipping Experts/Bandits with Long-term Memory NIPS 2019 Hypothesis Set Stability and Generalization NIPS 2019 Efficient Online Portfolio with Logarithmic Regret NIPS 2018 Practical Contextual Bandits with Regression Oracles ICML 2018 Efficient Contextual Bandits in Non-stationary Worlds COLT 2018 More Adaptive Algorithms for Adversarial Bandits COLT 2018 Logistic Regression: The Importance of Being Improper COLT 2018 Corralling a Band of Bandit Algorithms COLT 2017 Open Problem: First-Order Regret Bounds for Contextual Bandits COLT 2017 Efficient Second Order Online Learning by Sketching NIPS 2016 Optimal and Adaptive Algorithms for Online Boosting IJCAI 2016 Variance-Reduced and Projection-Free Stochastic Optimization ICML 2016 Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits NIPS 2016 Optimal and Adaptive Algorithms for Online Boosting ICML 2015 Achieving All with No Parameters: AdaNormalHedge COLT 2015 Fast Convergence of Regularized Learning in Games NIPS 2015 Online Gradient Boosting NIPS 2015 Towards Minimax Online Learning with Unknown Time Horizon ICML 2014 A Drifting-Games Analysis for Online Learning and Applications to Boosting NIPS 2014