conftrace_

Lihong Li

54 papers · 2008–2026 · 10 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+15 more ↓ 🧭 Keyword Pioneer πŸ—ΊοΈ Taxonomy Completionist (23) 🌈 Renaissance Researcher (5) πŸŒ‰ Interdisciplinary Bridge 🐣 Hot Topic Early Bird
πŸƒ Academic Marathon (17) πŸ—ΊοΈ Taxonomy Completionist (23) 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (3) πŸ‘‘ Triple Crown 🌱 Topic Pioneer πŸ”¬ Deep Specialist (12) 🧬 Topic Evolution πŸ† Keyword Champion (5) πŸš€ Conference Pioneer ⚑ Prolific Year (6) πŸ’Ž Century Club (52) πŸ—ƒοΈ Keyword Collector (63) πŸ“ˆ Trend Setter πŸ”₯ Unstoppable (9)

Conferences

NIPS (13) ICML (12) ICLR (8) AISTATS (6) ACL (4) EMNLP (4) JMLR (3) COLT (2) EACL (1) IJCNLP (1)

Papers

Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards ACL 2026 Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs EACL 2026 WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning EMNLP 2025 Understanding Domain Randomization for Sim-to-real Transfer ICLR 2022 On the Optimality of Batch Policy Optimization Algorithms ICML 2021 Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders AISTATS 2021 Neural Thompson Sampling ICLR 2021 Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL ICLR 2021 Near-Optimal Representation Learning for Linear Bandits and Linear RL ICML 2021 Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning ICLR 2020 Batch Stationary Distribution Estimation ICML 2020 GenDICE: Generalized Offline Estimation of Stationary Values ICLR 2020 Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation ICLR 2020 Randomized Exploration in Generalized Linear Bandits AISTATS 2020 CoinDICE: Off-Policy Confidence Interval Estimation NIPS 2020 Escaping the Gravitational Pull of Softmax NIPS 2020 Neural Contextual Bandits with UCB-based Exploration ICML 2020 Off-Policy Evaluation via the Regularized Lagrangian NIPS 2020 DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections NIPS 2019 Neural Logic Machines ICLR 2019 Policy Certificates: Towards Accountable Reinforcement Learning ICML 2019 A Kernel Loss for Solving the Bellman Equation NIPS 2019 Subgoal Discovery for Hierarchical Dialogue Policy Learning EMNLP 2018 Neural Approaches to Conversational AI ACL 2018 Boosting the Actor with Dual Critic ICLR 2018 Scalable Bilinear Pi Learning Using State and Action Features ICML 2018 SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation ICML 2018 Adversarial Attacks on Stochastic Bandits NIPS 2018 Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation NIPS 2018 End-to-End Task-Completion Neural Dialogue Systems IJCNLP 2017 Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes NIPS 2017 Provably Optimal Algorithms for Generalized Linear Contextual Bandits ICML 2017 Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning EMNLP 2017 Stochastic Variance Reduction Methods for Policy Evaluation ICML 2017 Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access ACL 2017 Deep Reinforcement Learning with a Natural Language Action Space ACL 2016 Active Learning with Oracle Epiphany NIPS 2016 An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives COLT 2016 Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads EMNLP 2016 Doubly Robust Off-policy Value Evaluation for Reinforcement Learning ICML 2016 Toward Minimax Off-policy Value Estimation AISTATS 2015 Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits ICML 2014 PAC-inspired Option Discovery in Lifelong Reinforcement Learning ICML 2014 Open Problem: Regret Bounds for Thompson Sampling COLT 2012 An Empirical Evaluation of Thompson Sampling NIPS 2011 Contextual Bandit Algorithms with Supervised Learning Guarantees AISTATS 2011 Linear-Time Estimators for Propensity Scores AISTATS 2011 Contextual Bandits with Linear Payoff Functions AISTATS 2011 Learning from Logged Implicit Exploration Data NIPS 2010 Parallelized Stochastic Gradient Descent NIPS 2010 Reinforcement Learning in Finite MDPs: PAC Analysis JMLR 2009 Provably Efficient Learning with Typed Parametric Models JMLR 2009 Sparse Online Learning via Truncated Gradient JMLR 2009 Sparse Online Learning via Truncated Gradient NIPS 2008