conftrace_

Shie Mannor

143 papers · 2003–2025 · 15 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+18 more ↓

🗺️ Taxonomy Completionist (40) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🐣 Hot Topic Early Bird 🌈 Renaissance Researcher (7) 🐝 Cross-Pollinator (13) 🏠 Conference Loyalist (41) 🌟 Keyword Trendsetter Combo (5) 🏆 Keyword Champion (3) 👑 Triple Crown 🌱 Topic Pioneer 🔬 Deep Specialist (18) 🤝 Dynamic Duo (19) 🏆 Grand Slam 🗃️ Keyword Collector (210) ❓ The Questioner (2) 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (18) ⚡ Prolific Year (10) 💎 Century Club (143)

Conferences

ICML (47) NIPS (41) COLT (13) AAAI (11) JMLR (10) ICLR (7) UAI (4) AISTATS (2) CVPR (2) ACML (1) ALT (1) CORL (1) IJCAI (1) RSS (1) WACV (1)

Top co-authors

Yonathan Efroni (19) Gal Dalal (15) Huan Xu (13) Gal Chechik (12) Constantine Caramanis (11) Guy Tennenholtz (10) Nadav Merlis (9) Assaf Hallak (8) Jeongyeol Kwon (8) Aviv Tamar (7)

Research topics

Keywords

reinforcement learning (32) online learning (25) regret bound (21) multi-armed bandit (13) markov decision process (12) policy gradient (11) robust optimization (9) regret minimization (8) stochastic optimization (8) sample complexity (6) contextual bandit (6) policy optimization (6) value function (6) model-based reinforcement learning (6) policy iteration (5) game theory (5) deep reinforcement learning (5) temporal difference learning (5) thompson sampling (5) robust markov decision process (5)

Papers

On Bits and Bandits: Quantifying the Regret-Information Trade-off ICLR 2025 Policy Gradient with Tree Expansion ICML 2025 RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression CVPR 2025 A Classification View on Meta Learning Bandits ICML 2025 Reinforcement Learning with Segment Feedback ICML 2025 Global Convergence of Policy Gradient in Average Reward MDPs ICLR 2025 Efficient Value Iteration for s-rectangular Robust Markov Decision Processes ICML 2024 Sobolev Space Regularised Pre Density Models ICML 2024 Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization ICML 2024 Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel ICML 2024 Improving Token-Based World Models with Parallel Observation Prediction ICML 2024 Solving Non-rectangular Reward-Robust MDPs via Frequency Regularization AAAI 2024 Tree Search-Based Policy Optimization under Stochastic Execution Delay ICLR 2024 RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation NIPS 2024 Prospective Side Information for Latent MDPs ICML 2024 Train Hard, Fight Easy: Robust Meta Reinforcement Learning NIPS 2023 Learning Hidden Markov Models When the Locations of Missing Observations are Unknown ICML 2023 Planning and Learning with Adaptive Lookahead AAAI 2023 PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient ICML 2023 Learning to Initiate and Reason in Event-Driven Cascading Processes ICML 2023 Reward-Mixing MDPs with Few Latent Contexts are Learnable ICML 2023 Representation-Driven Reinforcement Learning ICML 2023 Optimization or Architecture: How to Hack Kalman Filtering NIPS 2023 Individualized Dosing Dynamics via Neural Eigen Decomposition NIPS 2023 Policy Gradient for Rectangular Robust Markov Decision Processes NIPS 2023 DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles CORL 2022 Analysis of Stochastic Processes through Replay Buffers ICML 2022 Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning AAAI 2022 On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning ICLR 2022 Online Apprenticeship Learning AAAI 2022 Reinforcement Learning for Datacenter Congestion Control AAAI 2022 Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning NIPS 2022 Tractable Optimality in Episodic Latent MABs NIPS 2022 Finite Sample Analysis Of Dynamic Regression Parameter Learning NIPS 2022 Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms ICML 2022 Optimizing Tensor Network Contraction Using Reinforcement Learning ICML 2022 The Geometry of Robust Value Functions ICML 2022 Actor-Critic based Improper Reinforcement Learning ICML 2022 Efficient Risk-Averse Reinforcement Learning NIPS 2022 Reinforcement Learning with a Terminator NIPS 2022 Bandits with partially observable confounded data UAI 2021 Action redundancy in reinforcement learning UAI 2021 Robust Value Iteration for Continuous Control Tasks RSS 2021 Reinforcement Learning in Reward-Mixing MDPs NIPS 2021 Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction NIPS 2021 Sim and Real: Better Together NIPS 2021 Twice regularized MDPs and the equivalence between robustness and regularization NIPS 2021 RL for Latent MDPs: Regret Guarantees and a Lower Bound NIPS 2021 Reinforcement Learning with Trajectory Feedback AAAI 2021 Lenient Regret for Multi-Armed Bandits AAAI 2021 Online Limited Memory Neural-Linear Bandits with Likelihood Matching ICML 2021 Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks ICML 2021 Over-the-Air Adversarial Flickering Attacks Against Video Recognition Networks CVPR 2021 Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning ICLR 2021 Acting in Delayed Environments with Non-Stationary Markov Policies ICLR 2021 Value Iteration in Continuous Actions, States and Time ICML 2021 Detecting Rewards Deterioration in Episodic Reinforcement Learning ICML 2021 Confidence-Budget Matching for Sequential Budgeted Learning ICML 2021 Known unknowns: Learning novel concepts using reasoning-by-elimination UAI 2021 Tight Lower Bounds for Combinatorial Multi-Armed Bandits COLT 2020 An adaptive stochastic optimization algorithm for resource allocation ALT 2020 Online Planning with Lookahead Policies NIPS 2020 Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs AAAI 2020 Optimistic Policy Optimization with Bandit Feedback ICML 2020 Topic Modeling via Full Dependence Mixtures ICML 2020 Off-Policy Evaluation in Partially Observable Environments AAAI 2020 Scalable Detection of Offensive and Non-compliant Content / Logo in Product Images WACV 2020 Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies NIPS 2019 Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem COLT 2019 Action Robust Reinforcement Learning and Applications in Continuous Control ICML 2019 Reward Constrained Policy Optimization ICLR 2019 The Natural Language of Actions ICML 2019 Exploration Conscious Reinforcement Learning Revisited ICML 2019 A Bayesian Approach to Robust Reinforcement Learning UAI 2019 Nonlinear Distributional Gradient Temporal-Difference Learning ICML 2019 On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters AAAI 2019 How to Combine Tree-Search Methods in Reinforcement Learning AAAI 2019 Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning NIPS 2019 Distributional Policy Optimization: An Alternative Approach for Continuous Control NIPS 2019 Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning NIPS 2018 Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning NIPS 2018 Beyond the One-Step Greedy Approach in Reinforcement Learning ICML 2018 A General Approach to Multi-Armed Bandits Under Risk Criteria COLT 2018 Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning COLT 2018 Multi-objective Bandits: Optimizing the Generalized Gini Index ICML 2017 Rotting Bandits NIPS 2017 End-to-End Differentiable Adversarial Imitation Learning ICML 2017 Approximate Value Iteration with Temporally Extended Actions (Extended Abstract) IJCAI 2017 Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization COLT 2017 Shallow Updates for Deep Reinforcement Learning NIPS 2017 Consistent On-Line Off-Policy Evaluation ICML 2017 Adaptive Skills Adaptive Partitions (ASAP) NIPS 2016 Heteroscedastic Sequences: Beyond Gaussianity ICML 2016 Graying the black box: Understanding DQNs ICML 2016 Hierarchical Decision Making In Electricity Grid Management ICML 2016 Learning the Variance of the Reward-To-Go JMLR 2016 Regularized Policy Iteration with Nonparametric Function Spaces JMLR 2016 Policy Gradient for Coherent Risk Measures NIPS 2015 Community Detection via Measure Space Embedding NIPS 2015 Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach NIPS 2015 Dynamic Sensing: Better Classification under Acquisition Constraints ICML 2015 Off-policy Model-based Learning under Unknown Factored Dynamics ICML 2015 Thompson Sampling for Learning Parameterized Markov Decision Processes COLT 2015 Sensor Selection for Crowdsensing Dynamical Systems AISTATS 2015 Online Learning for Adversaries with Memory: Price of Past Mistakes NIPS 2015 Set-Valued Approachability and Online Learning with Partial Monitoring JMLR 2014 Robust Logistic Regression and Classification NIPS 2014 Time-Regularized Interrupting Options (TRIO) ICML 2014 How hard is my MDP?" The distribution-norm to the rescue" NIPS 2014 Concept Drift Detection Through Resampling ICML 2014 Scaling Up Robust MDPs using Function Approximation ICML 2014 Approachability in unknown games: Online learning meets multi-objective optimization COLT 2014 Latent Bandits. ICML 2014 Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations ICML 2014 Thompson Sampling for Complex Online Problems ICML 2014 Reinforcement Learning in Robust Markov Decision Processes NIPS 2013 Robust Sparse Regression under Adversarial Corruption ICML 2013 Temporal Difference Methods for the Variance of the Reward To Go ICML 2013 Approachability, fast and slow COLT 2013 Online Learning for Time Series Prediction COLT 2013 Opportunistic Strategies for Generalized No-Regret Problems COLT 2013 Learning Multiple Models via Regularized Weighting NIPS 2013 Online PCA for Contaminated Data NIPS 2013 The Perturbed Variation NIPS 2012 More Is Better: Large Scale Partially-supervised Sentiment Classification ACML 2012 Statistical Optimization in High Dimensions AISTATS 2012 The Sample Complexity of Dictionary Learning JMLR 2011 Does an Efficient Calibrated Forecasting Strategy Exist? COLT 2011 The Sample Complexity of Dictionary Learning COLT 2011 Robust approachability and regret minimization in games with partial monitoring COLT 2011 From Bandits to Experts: On the Value of Side-Observations NIPS 2011 Committing Bandits NIPS 2011 Distributionally Robust Markov Decision Processes NIPS 2010 Online Classification with Specificity Constraints NIPS 2010 Robustness and Regularization of Support Vector Machines JMLR 2009 Online Learning with Sample Path Constraints JMLR 2009 Regularized Policy Iteration NIPS 2008 Robust Regression and Lasso NIPS 2008 The Robustness-Performance Tradeoff in Markov Decision Processes NIPS 2006 Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems JMLR 2006 A Geometric Approach to Multi-Criterion Reinforcement Learning JMLR 2004 The Sample Complexity of Exploration in the Multi-Armed Bandit Problem JMLR 2004 Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity JMLR 2003