Han Zhong
29 papers · 2021–2025 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Conference Polyglot (5) π Cross-Pollinator (13) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5)
π
Renaissance Researcher
(5)
πΊοΈ
Taxonomy Completionist
(31)
π€
Dynamic Duo
(15)
π
Triple Crown
ποΈ
Keyword Collector
(76)
π₯
Unstoppable
(5)
π
Century Club
(29)
β‘
Prolific Year
(9)
β
The Questioner
Conferences
ICML (12)
NIPS (10)
ICLR (5)
AISTATS (1)
JMLR (1)
Top co-authors
Keywords
regret bound
(9)
markov game
(4)
function approximation
(4)
reinforcement learning
(3)
posterior sampling
(2)
offline reinforcement learning
(2)
sample efficiency
(2)
linear bandit
(2)
model-based reinforcement learning
(2)
sample complexity
(2)
multi-armed bandit
(2)
minimax optimization
(1)
computational complexity
(1)
equilibrium learning
(1)
policy optimization
(1)
robust statistics
(1)
adversarial robustness
(1)
nash equilibrium
(1)
linear function approximation
(1)
vc dimension
(1)
Papers
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
ICML 2025
DPO Meets PPO: Reinforced Token Optimization for RLHF
ICML 2025
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
ICML 2025
Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
ICML 2024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
ICML 2024
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
NIPS 2024
A3S: A General Active Clustering Method with Pairwise Constraints
ICML 2024
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
ICML 2024
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithms
NIPS 2024
Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation
AISTATS 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
ICML 2024
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
ICLR 2024
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
ICLR 2024
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?
JMLR 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
NIPS 2023
Posterior Sampling for Competitive RL: Function Approximation and Partial Observation
NIPS 2023
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
NIPS 2023
Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds
NIPS 2023
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
NIPS 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
NIPS 2023
Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
ICLR 2023
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
ICLR 2023
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
ICML 2022
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
ICML 2022
Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
ICML 2022
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
ICML 2022
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning
ICLR 2022
Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
NIPS 2022
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
NIPS 2021