Lihong Li
54 papers · 2008–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (23) π Renaissance Researcher (5) π Interdisciplinary Bridge π£ Hot Topic Early Bird
π
Academic Marathon
(17)
πΊοΈ
Taxonomy Completionist
(23)
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(3)
π
Triple Crown
π±
Topic Pioneer
π¬
Deep Specialist
(12)
π§¬
Topic Evolution
π
Keyword Champion
(5)
π
Conference Pioneer
β‘
Prolific Year
(6)
π
Century Club
(52)
ποΈ
Keyword Collector
(63)
π
Trend Setter
π₯
Unstoppable
(9)
Conferences
NIPS (13)
ICML (12)
ICLR (8)
AISTATS (6)
ACL (4)
EMNLP (4)
JMLR (3)
COLT (2)
EACL (1)
IJCNLP (1)
Top co-authors
Research topics
Keywords
reinforcement learning
(13)
regret bound
(10)
contextual bandit
(8)
online learning
(7)
off-policy evaluation
(7)
stationary distribution
(5)
markov decision process
(5)
sample complexity
(4)
value function
(4)
dialogue system
(4)
neural network
(3)
policy learning
(3)
importance sampling
(3)
upper confidence bound
(3)
end-to-end learning
(3)
truncated gradient
(2)
value function approximation
(2)
convex loss
(2)
bellman equation
(2)
l1 regularization
(2)
Papers
Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards
ACL 2026
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
EACL 2026
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
EMNLP 2025
Understanding Domain Randomization for Sim-to-real Transfer
ICLR 2022
On the Optimality of Batch Policy Optimization Algorithms
ICML 2021
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
AISTATS 2021
Neural Thompson Sampling
ICLR 2021
Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL
ICLR 2021
Near-Optimal Representation Learning for Linear Bandits and Linear RL
ICML 2021
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
ICLR 2020
Batch Stationary Distribution Estimation
ICML 2020
GenDICE: Generalized Offline Estimation of Stationary Values
ICLR 2020
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
ICLR 2020
Randomized Exploration in Generalized Linear Bandits
AISTATS 2020
CoinDICE: Off-Policy Confidence Interval Estimation
NIPS 2020
Escaping the Gravitational Pull of Softmax
NIPS 2020
Neural Contextual Bandits with UCB-based Exploration
ICML 2020
Off-Policy Evaluation via the Regularized Lagrangian
NIPS 2020
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
NIPS 2019
Neural Logic Machines
ICLR 2019
Policy Certificates: Towards Accountable Reinforcement Learning
ICML 2019
A Kernel Loss for Solving the Bellman Equation
NIPS 2019
Subgoal Discovery for Hierarchical Dialogue Policy Learning
EMNLP 2018
Neural Approaches to Conversational AI
ACL 2018
Boosting the Actor with Dual Critic
ICLR 2018
Scalable Bilinear Pi Learning Using State and Action Features
ICML 2018
SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation
ICML 2018
Adversarial Attacks on Stochastic Bandits
NIPS 2018
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
NIPS 2018
End-to-End Task-Completion Neural Dialogue Systems
IJCNLP 2017
Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes
NIPS 2017
Provably Optimal Algorithms for Generalized Linear Contextual Bandits
ICML 2017
Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning
EMNLP 2017
Stochastic Variance Reduction Methods for Policy Evaluation
ICML 2017
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access
ACL 2017
Deep Reinforcement Learning with a Natural Language Action Space
ACL 2016
Active Learning with Oracle Epiphany
NIPS 2016
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives
COLT 2016
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads
EMNLP 2016
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
ICML 2016
Toward Minimax Off-policy Value Estimation
AISTATS 2015
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
ICML 2014
PAC-inspired Option Discovery in Lifelong Reinforcement Learning
ICML 2014
Open Problem: Regret Bounds for Thompson Sampling
COLT 2012
An Empirical Evaluation of Thompson Sampling
NIPS 2011
Contextual Bandit Algorithms with Supervised Learning Guarantees
AISTATS 2011
Linear-Time Estimators for Propensity Scores
AISTATS 2011
Contextual Bandits with Linear Payoff Functions
AISTATS 2011
Learning from Logged Implicit Exploration Data
NIPS 2010
Parallelized Stochastic Gradient Descent
NIPS 2010
Reinforcement Learning in Finite MDPs: PAC Analysis
JMLR 2009
Provably Efficient Learning with Typed Parametric Models
JMLR 2009
Sparse Online Learning via Truncated Gradient
JMLR 2009
Sparse Online Learning via Truncated Gradient
NIPS 2008