Joey Hong

17 papers · 2019–2025 · 5 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (5) 🐝 Cross-Pollinator (14)

🌍 Conference Polyglot (5) 🏃 Academic Marathon (6) 🏆 Keyword Champion (2) 👑 Triple Crown 🗃️ Keyword Collector (51) 💎 Century Club (17) 🔥 Unstoppable (7) ❓ The Questioner

Conferences

ICLR (6) ICML (5) AISTATS (3) NIPS (2) CVPR (1)

Top co-authors

Manzil Zaheer (8) Sergey Levine (7) Branislav Kveton (6) Anca Dragan (4) Mohammad Ghavamzadeh (4) Amr Ahmed (2) Charles Sutton (2) Aviral Kumar (2) Sumeet Katariya (2) Yinlam Chow (2)

Keywords

thompson sampling (4) regret bound (4) multi-armed bandit (3) bayesian inference (2) off-policy learning (2) hierarchical bayesian model (2) latent state (2) contextual bandit (2) autonomous driving (1) human behavior (1) offline reinforcement learning (1) bandit feedback (1) behavioral adaptation (1) hierarchical model (1) policy optimization (1) upper confidence bound (1) program synthesis (1) trajectory prediction (1) online learning (1) multi-task learning (1)

Papers

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models ICML 2025 Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning ICLR 2025 ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis ICLR 2024 Offline RL with Observation Histories: Analyzing and Improving Sample Complexity ICLR 2024 Learning to Explore in POMDPs with Informational Rewards ICML 2024 On the Sensitivity of Reward Inference to Misspecified Human Models ICLR 2023 Learning to Influence Human Behavior with Offline Reinforcement Learning NIPS 2023 Confidence-Conditioned Value Functions for Offline Reinforcement Learning ICLR 2023 Multi-Task Off-Policy Learning from Bandit Feedback ICML 2023 Should I Run Offline Reinforcement Learning or Behavioral Cloning? ICLR 2022 Hierarchical Bayesian Bandits AISTATS 2022 Deep Hierarchy in Bandits ICML 2022 Thompson Sampling with a Mixture Prior AISTATS 2022 Non-Stationary Off-Policy Optimization AISTATS 2021 Latent Programmer: Discrete Latent Codes for Program Synthesis ICML 2021 Latent Bandits Revisited NIPS 2020 Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions CVPR 2019