Mohammad Ghavamzadeh
88 papers · 2006–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (32) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π£ Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(32)
π
Interdisciplinary Bridge
π
Academic Marathon
(19)
π
Conference Loyalist
(26)
π
Keyword Trendsetter Combo
(4)
π±
Topic Pioneer
π
Triple Crown
π¬
Deep Specialist
(10)
π§¬
Topic Evolution
π
Keyword Champion
π
Grand Slam
π€
Dynamic Duo
(18)
ποΈ
Keyword Collector
(115)
π
Trend Setter
π₯
Unstoppable
(16)
π
Conference Pioneer
π
Century Club
(87)
β‘
Prolific Year
(10)
Conferences
NIPS (26)
ICML (18)
AISTATS (13)
JMLR (8)
ICLR (7)
IJCAI (6)
AAAI (4)
L4DC (2)
UAI (2)
ACML (1)
CORL (1)
Top co-authors
Research topics
Keywords
reinforcement learning
(16)
regret bound
(15)
multi-armed bandit
(14)
markov decision process
(13)
policy gradient
(12)
dynamic programming
(7)
policy iteration
(7)
policy learning
(6)
sample complexity
(5)
contextual bandit
(5)
regret minimization
(5)
model-based reinforcement learning
(5)
online algorithm
(5)
temporal difference learning
(4)
value iteration
(4)
sequential decision making
(4)
sample efficiency
(4)
stochastic optimization
(4)
thompson sampling
(4)
online learning
(4)
Papers
Preference Optimization via Contrastive Divergence: Your Policy Is Secretly an NLL Estimator
AAAI 2026
Contextual Bandits with Stage-wise Constraints
JMLR 2025
Conservative Contextual Bandits: Beyond Linear Representations
ICLR 2025
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
AISTATS 2025
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
L4DC 2025
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
ICLR 2024
Bayesian Regret Minimization in Offline Bandits
ICML 2024
Maximum Entropy Model Correction in Reinforcement Learning
ICLR 2024
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
NIPS 2023
Multiple-policy High-confidence Policy Evaluation
AISTATS 2023
Multi-Task Off-Policy Learning from Bandit Feedback
ICML 2023
A Mixture-of-Expert Approach to RL-based Dialogue Management
ICLR 2023
Entropic Risk Optimization in Discounted MDPs
AISTATS 2023
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
NIPS 2023
Meta-Learning for Simple Regret Minimization
AAAI 2023
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
NIPS 2023
On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes
NIPS 2023
Thompson Sampling with a Mixture Prior
AISTATS 2022
Feature and Parameter Selection in Stochastic Linear Bandits
ICML 2022
Private and Communication-Efficient Algorithms for Entropy Estimation
NIPS 2022
Robust Reinforcement Learning using Offline Data
NIPS 2022
Efficient Risk-Averse Reinforcement Learning
NIPS 2022
Operator Splitting Value Iteration
NIPS 2022
Deep Hierarchy in Bandits
ICML 2022
Hierarchical Bayesian Bandits
AISTATS 2022
Mirror Descent Policy Optimization
ICLR 2022
Fixed-Budget Best-Arm Identification in Structured Bandits
IJCAI 2022
Adaptive Sampling for Minimax Fair Classification
NIPS 2021
Control-Aware Representations for Model-based Reinforcement Learning
ICLR 2021
PID Accelerated Value Iteration Algorithm
ICML 2021
Deep Bayesian Quadrature Policy Optimization
AAAI 2021
Stochastic Bandits with Linear Constraints
AISTATS 2021
Variational Model-based Policy Optimization
IJCAI 2021
Neural Lyapunov Redesign
L4DC 2021
Safe Policy Learning for Continuous Control
CORL 2020
Active Model Estimation in Markov Decision Processes
UAI 2020
Multi-step Greedy Reinforcement Learning Algorithms
ICML 2020
Predictive Coding for Locally-Linear Control
ICML 2020
Conservative Exploration in Reinforcement Learning
AISTATS 2020
Randomized Exploration in Generalized Linear Bandits
AISTATS 2020
Adaptive Sampling for Estimating Probability Distributions
ICML 2020
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
ICLR 2020
Online Planning with Lookahead Policies
NIPS 2020
Improved Algorithms for Conservative Exploration in Bandits
AAAI 2020
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
NIPS 2019
Optimizing over a Restricted Policy Class in MDPs
AISTATS 2019
Perturbed-History Exploration in Stochastic Multi-Armed Bandits
IJCAI 2019
Perturbed-History Exploration in Stochastic Linear Bandits
UAI 2019
Risk-Sensitive Generative Adversarial Imitation Learning
AISTATS 2019
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
ICML 2019
Robust Locally-Linear Controllable Embedding
AISTATS 2018
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
NIPS 2018
A Lyapunov-based Approach to Safe Reinforcement Learning
NIPS 2018
Path Consistency Learning in Tsallis Entropy Regularized MDPs
ICML 2018
More Robust Doubly Robust Off-policy Evaluation
ICML 2018
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
JMLR 2018
Model-Independent Online Learning for Influence Maximization
ICML 2017
Online Learning to Rank in Stochastic Click Models
ICML 2017
Sequential Multiple Hypothesis Testing with Type I Error Control
AISTATS 2017
Conservative Contextual Linear Bandits
NIPS 2017
Active Learning for Accurate Estimation of Linear Models
ICML 2017
Bottleneck Conditional Density Estimation
ICML 2017
Improved Learning Complexity in Combinatorial Pure Exploration Bandits
AISTATS 2016
Analysis of Classification-based Policy Iteration Algorithms
JMLR 2016
Bayesian Policy Gradient and Actor-Critic Algorithms
JMLR 2016
Proximal Gradient Temporal Difference Learning Algorithms
IJCAI 2016
Regularized Policy Iteration with Nonparametric Function Spaces
JMLR 2016
Safe Policy Improvement by Minimizing Robust Baseline Regret
NIPS 2016
Approximate Modified Policy Iteration and its Application to the Game of Tetris
JMLR 2015
High Confidence Policy Improvement
ICML 2015
Policy Gradient for Coherent Risk Measures
NIPS 2015
Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees
IJCAI 2015
Maximum Entropy Semi-Supervised Inverse Reinforcement Learning
IJCAI 2015
Algorithms for CVaR Optimization in MDPs
NIPS 2014
Actor-Critic Algorithms for Risk-Sensitive MDPs
NIPS 2013
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
NIPS 2013
A Generalized Kernel Approach to Structured Output Learning
ICML 2013
Cost-sensitive Multiclass Classification Risk Bounds
ICML 2013
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
NIPS 2012
Finite-Sample Analysis of Least-Squares Policy Iteration
JMLR 2012
Multi-Bandit Best Arm Identification
NIPS 2011
Speedy Q-Learning
NIPS 2011
LSTD with Random Projections
NIPS 2010
Finite-sample Analysis of Bellman Residual Minimization
ACML 2010
Regularized Policy Iteration
NIPS 2008
Hierarchical Average Reward Reinforcement Learning
JMLR 2007
Incremental Natural Actor-Critic Algorithms
NIPS 2007
Bayesian Policy Gradient Algorithms
NIPS 2006