Shie Mannor
143 papers · 2003–2025 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (40) π§ Keyword Pioneer π Renaissance Researcher (7) π Interdisciplinary Bridge π£ Hot Topic Early Bird
π£
Hot Topic Early Bird
π
Renaissance Researcher
(7)
π
Cross-Pollinator
(13)
π
Conference Loyalist
(41)
π
Keyword Trendsetter Combo
(5)
π
Keyword Champion
(3)
π
Triple Crown
π±
Topic Pioneer
π¬
Deep Specialist
(18)
π€
Dynamic Duo
(19)
π
Grand Slam
ποΈ
Keyword Collector
(210)
β
The Questioner
(2)
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(18)
β‘
Prolific Year
(10)
π
Century Club
(143)
Conferences
ICML (47)
NIPS (41)
COLT (13)
AAAI (11)
JMLR (10)
ICLR (7)
UAI (4)
AISTATS (2)
CVPR (2)
ACML (1)
ALT (1)
CORL (1)
IJCAI (1)
RSS (1)
WACV (1)
Top co-authors
Research topics
Keywords
reinforcement learning
(32)
online learning
(25)
regret bound
(21)
multi-armed bandit
(13)
markov decision process
(12)
policy gradient
(11)
robust optimization
(9)
regret minimization
(8)
stochastic optimization
(8)
sample complexity
(6)
contextual bandit
(6)
policy optimization
(6)
value function
(6)
model-based reinforcement learning
(6)
policy iteration
(5)
game theory
(5)
deep reinforcement learning
(5)
temporal difference learning
(5)
thompson sampling
(5)
robust markov decision process
(5)
Papers
On Bits and Bandits: Quantifying the Regret-Information Trade-off
ICLR 2025
Policy Gradient with Tree Expansion
ICML 2025
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
CVPR 2025
A Classification View on Meta Learning Bandits
ICML 2025
Reinforcement Learning with Segment Feedback
ICML 2025
Global Convergence of Policy Gradient in Average Reward MDPs
ICLR 2025
Efficient Value Iteration for s-rectangular Robust Markov Decision Processes
ICML 2024
Sobolev Space Regularised Pre Density Models
ICML 2024
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
ICML 2024
Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel
ICML 2024
Improving Token-Based World Models with Parallel Observation Prediction
ICML 2024
Solving Non-rectangular Reward-Robust MDPs via Frequency Regularization
AAAI 2024
Tree Search-Based Policy Optimization under Stochastic Execution Delay
ICLR 2024
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
NIPS 2024
Prospective Side Information for Latent MDPs
ICML 2024
Train Hard, Fight Easy: Robust Meta Reinforcement Learning
NIPS 2023
Learning Hidden Markov Models When the Locations of Missing Observations are Unknown
ICML 2023
Planning and Learning with Adaptive Lookahead
AAAI 2023
PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient
ICML 2023
Learning to Initiate and Reason in Event-Driven Cascading Processes
ICML 2023
Reward-Mixing MDPs with Few Latent Contexts are Learnable
ICML 2023
Representation-Driven Reinforcement Learning
ICML 2023
Optimization or Architecture: How to Hack Kalman Filtering
NIPS 2023
Individualized Dosing Dynamics via Neural Eigen Decomposition
NIPS 2023
Policy Gradient for Rectangular Robust Markov Decision Processes
NIPS 2023
DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles
CORL 2022
Analysis of Stochastic Processes through Replay Buffers
ICML 2022
Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning
AAAI 2022
On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning
ICLR 2022
Online Apprenticeship Learning
AAAI 2022
Reinforcement Learning for Datacenter Congestion Control
AAAI 2022
Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning
NIPS 2022
Tractable Optimality in Episodic Latent MABs
NIPS 2022
Finite Sample Analysis Of Dynamic Regression Parameter Learning
NIPS 2022
Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms
ICML 2022
Optimizing Tensor Network Contraction Using Reinforcement Learning
ICML 2022
The Geometry of Robust Value Functions
ICML 2022
Actor-Critic based Improper Reinforcement Learning
ICML 2022
Efficient Risk-Averse Reinforcement Learning
NIPS 2022
Reinforcement Learning with a Terminator
NIPS 2022
Bandits with partially observable confounded data
UAI 2021
Action redundancy in reinforcement learning
UAI 2021
Robust Value Iteration for Continuous Control Tasks
RSS 2021
Reinforcement Learning in Reward-Mixing MDPs
NIPS 2021
Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction
NIPS 2021
Sim and Real: Better Together
NIPS 2021
Twice regularized MDPs and the equivalence between robustness and regularization
NIPS 2021
RL for Latent MDPs: Regret Guarantees and a Lower Bound
NIPS 2021
Reinforcement Learning with Trajectory Feedback
AAAI 2021
Lenient Regret for Multi-Armed Bandits
AAAI 2021
Online Limited Memory Neural-Linear Bandits with Likelihood Matching
ICML 2021
Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks
ICML 2021
Over-the-Air Adversarial Flickering Attacks Against Video Recognition Networks
CVPR 2021
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
ICLR 2021
Acting in Delayed Environments with Non-Stationary Markov Policies
ICLR 2021
Value Iteration in Continuous Actions, States and Time
ICML 2021
Detecting Rewards Deterioration in Episodic Reinforcement Learning
ICML 2021
Confidence-Budget Matching for Sequential Budgeted Learning
ICML 2021
Known unknowns: Learning novel concepts using reasoning-by-elimination
UAI 2021
Tight Lower Bounds for Combinatorial Multi-Armed Bandits
COLT 2020
An adaptive stochastic optimization algorithm for resource allocation
ALT 2020
Online Planning with Lookahead Policies
NIPS 2020
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
AAAI 2020
Optimistic Policy Optimization with Bandit Feedback
ICML 2020
Topic Modeling via Full Dependence Mixtures
ICML 2020
Off-Policy Evaluation in Partially Observable Environments
AAAI 2020
Scalable Detection of Offensive and Non-compliant Content / Logo in Product Images
WACV 2020
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
NIPS 2019
Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem
COLT 2019
Action Robust Reinforcement Learning and Applications in Continuous Control
ICML 2019
Reward Constrained Policy Optimization
ICLR 2019
The Natural Language of Actions
ICML 2019
Exploration Conscious Reinforcement Learning Revisited
ICML 2019
A Bayesian Approach to Robust Reinforcement Learning
UAI 2019
Nonlinear Distributional Gradient Temporal-Difference Learning
ICML 2019
On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters
AAAI 2019
How to Combine Tree-Search Methods in Reinforcement Learning
AAAI 2019
Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning
NIPS 2019
Distributional Policy Optimization: An Alternative Approach for Continuous Control
NIPS 2019
Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
NIPS 2018
Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning
NIPS 2018
Beyond the One-Step Greedy Approach in Reinforcement Learning
ICML 2018
A General Approach to Multi-Armed Bandits Under Risk Criteria
COLT 2018
Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning
COLT 2018
Multi-objective Bandits: Optimizing the Generalized Gini Index
ICML 2017
Rotting Bandits
NIPS 2017
End-to-End Differentiable Adversarial Imitation Learning
ICML 2017
Approximate Value Iteration with Temporally Extended Actions (Extended Abstract)
IJCAI 2017
Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization
COLT 2017
Shallow Updates for Deep Reinforcement Learning
NIPS 2017
Consistent On-Line Off-Policy Evaluation
ICML 2017
Adaptive Skills Adaptive Partitions (ASAP)
NIPS 2016
Heteroscedastic Sequences: Beyond Gaussianity
ICML 2016
Graying the black box: Understanding DQNs
ICML 2016
Hierarchical Decision Making In Electricity Grid Management
ICML 2016
Learning the Variance of the Reward-To-Go
JMLR 2016
Regularized Policy Iteration with Nonparametric Function Spaces
JMLR 2016
Policy Gradient for Coherent Risk Measures
NIPS 2015
Community Detection via Measure Space Embedding
NIPS 2015
Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach
NIPS 2015
Dynamic Sensing: Better Classification under Acquisition Constraints
ICML 2015
Off-policy Model-based Learning under Unknown Factored Dynamics
ICML 2015
Thompson Sampling for Learning Parameterized Markov Decision Processes
COLT 2015
Sensor Selection for Crowdsensing Dynamical Systems
AISTATS 2015
Online Learning for Adversaries with Memory: Price of Past Mistakes
NIPS 2015
Set-Valued Approachability and Online Learning with Partial Monitoring
JMLR 2014
Robust Logistic Regression and Classification
NIPS 2014
Time-Regularized Interrupting Options (TRIO)
ICML 2014
How hard is my MDP?" The distribution-norm to the rescue"
NIPS 2014
Concept Drift Detection Through Resampling
ICML 2014
Scaling Up Robust MDPs using Function Approximation
ICML 2014
Approachability in unknown games: Online learning meets multi-objective optimization
COLT 2014
Latent Bandits.
ICML 2014
Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations
ICML 2014
Thompson Sampling for Complex Online Problems
ICML 2014
Reinforcement Learning in Robust Markov Decision Processes
NIPS 2013
Robust Sparse Regression under Adversarial Corruption
ICML 2013
Temporal Difference Methods for the Variance of the Reward To Go
ICML 2013
Approachability, fast and slow
COLT 2013
Online Learning for Time Series Prediction
COLT 2013
Opportunistic Strategies for Generalized No-Regret Problems
COLT 2013
Learning Multiple Models via Regularized Weighting
NIPS 2013
Online PCA for Contaminated Data
NIPS 2013
The Perturbed Variation
NIPS 2012
More Is Better: Large Scale Partially-supervised Sentiment Classification
ACML 2012
Statistical Optimization in High Dimensions
AISTATS 2012
The Sample Complexity of Dictionary Learning
JMLR 2011
Does an Efficient Calibrated Forecasting Strategy Exist?
COLT 2011
The Sample Complexity of Dictionary Learning
COLT 2011
Robust approachability and regret minimization in games with partial monitoring
COLT 2011
From Bandits to Experts: On the Value of Side-Observations
NIPS 2011
Committing Bandits
NIPS 2011
Distributionally Robust Markov Decision Processes
NIPS 2010
Online Classification with Specificity Constraints
NIPS 2010
Robustness and Regularization of Support Vector Machines
JMLR 2009
Online Learning with Sample Path Constraints
JMLR 2009
Regularized Policy Iteration
NIPS 2008
Robust Regression and Lasso
NIPS 2008
The Robustness-Performance Tradeoff in Markov Decision Processes
NIPS 2006
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
JMLR 2006
A Geometric Approach to Multi-Criterion Reinforcement Learning
JMLR 2004
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
JMLR 2004
Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity
JMLR 2003