Ohad Shamir
102 papers · 2007–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
π§ Keyword Pioneer π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (31) π Renaissance Researcher (5) π£ Hot Topic Early Bird
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π
Cross-Pollinator
(8)
π
Conference Loyalist
(29)
πΊ
Lone Wolf
(16)
π
Keyword Trendsetter Combo
(11)
π€
Dynamic Duo
(11)
π¬
Deep Specialist
(11)
π
Keyword Champion
(4)
π±
Topic Pioneer
ποΈ
Keyword Collector
(152)
π
Trend Setter
π₯
Unstoppable
(16)
π
Conference Pioneer
β‘
Prolific Year
(6)
β
The Questioner
(5)
π
Century Club
(102)
Conferences
NIPS (29)
COLT (28)
ICML (19)
JMLR (12)
AISTATS (8)
ALT (3)
CVPR (1)
ICLR (1)
IJCAI (1)
Top co-authors
Keywords
convex optimization
(15)
gradient descent
(12)
stochastic gradient descent
(11)
relu network
(10)
sample complexity
(10)
regret bound
(9)
neural network
(9)
online learning
(8)
neural network optimization
(7)
stochastic optimization
(7)
lower bound
(7)
non-convex optimization
(6)
learning theory
(5)
distributed optimization
(5)
matrix completion
(5)
stochastic gradient
(5)
implicit bia
(5)
optimization theory
(4)
nonsmooth optimization
(4)
principal component analysis
(4)
Papers
The Oracle Complexity of Simplex-based Matrix Games: Linear Separability and Nash Equilibria
COLT 2025
Logarithmic Width Suffices for Robust Memorization
COLT 2025
Open Problem: Anytime Convergence Rate of Gradient Descent
COLT 2024
Depth Separation in Norm-Bounded Infinite-Width Neural Networks
COLT 2024
An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization
JMLR 2024
Generalization in Kernel Regression Under Realistic Assumptions
ICML 2024
From Tempered to Benign Overfitting in ReLU Neural Networks
NIPS 2023
The Implicit Bias of Benign Overfitting
JMLR 2023
Deterministic Nonsmooth Nonconvex Optimization
COLT 2023
Implicit Regularization Towards Rank Minimization in ReLU Networks
ALT 2023
Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance
NIPS 2023
Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks
NIPS 2023
The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication (Extended Abstract)
IJCAI 2022
The Sample Complexity of One-Hidden-Layer Neural Networks
NIPS 2022
Gradient Methods Provably Converge to Non-Robust Networks
NIPS 2022
Reconstructing Training Data From Trained Neural Networks
NIPS 2022
On Margin Maximization in Linear and ReLU Networks
NIPS 2022
The Implicit Bias of Benign Overfitting
COLT 2022
Width is Less Important than Depth in ReLU Neural Networks
COLT 2022
On the Optimal Memorization Power of ReLU Neural Networks
ICLR 2022
Oracle Complexity in Nonsmooth Nonconvex Optimization
JMLR 2022
Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems
NIPS 2021
A Stochastic Newton Algorithm for Distributed Convex Optimization
NIPS 2021
Oracle Complexity in Nonsmooth Nonconvex Optimization
NIPS 2021
Learning a Single Neuron with Bias Using Gradient Descent
NIPS 2021
Gradient Methods Never Overfit On Separable Data
JMLR 2021
The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication
COLT 2021
Implicit Regularization in ReLU Networks with the Square Loss
COLT 2021
Size and Depth Separation in Approximating Benign Functions with Neural Networks
COLT 2021
The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks
COLT 2021
The Connection Between Approximation, Depth Separation and Learnability in Neural Networks
COLT 2021
Is Local SGD Better than Minibatch SGD?
ICML 2020
A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates
ALT 2020
How Good is SGD with Random Shuffling?
COLT 2020
Neural Networks with Small Weights and Depth-Separation Barriers
NIPS 2020
The Complexity of Finding Stationary Points with Stochastic Gradient Descent
ICML 2020
Proving the Lottery Ticket Hypothesis: Pruning is All You Need
ICML 2020
Space lower bounds for linear prediction in the streaming model
COLT 2019
Depth Separations in Neural Networks: What is Actually Being Separated?
COLT 2019
On the Power and Limitations of Random Features for Understanding Neural Networks
NIPS 2019
The Complexity of Making the Gradient Small in Stochastic Convex Optimization
COLT 2019
Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
COLT 2019
Are ResNets Provably Better than Linear Predictors?
NIPS 2018
Spurious Local Minima are Common in Two-Layer ReLU Neural Networks
ICML 2018
Size-Independent Sample Complexity of Neural Networks
COLT 2018
Bandit Regret Scaling with the Effective Loss Range
ALT 2018
Detecting Correlations with Little Memory and Communication
COLT 2018
Distribution-Specific Hardness of Learning Neural Networks
JMLR 2018
Global Non-convex Optimization with Discretized Diffusions
NIPS 2018
Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks
ICML 2017
An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
JMLR 2017
Preface: Conference on Learning Theory (COLT), 2017
COLT 2017
Oracle Complexity of Second-Order Methods for Finite-Sum Problems
ICML 2017
Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis
ICML 2017
Failures of Gradient-Based Deep Learning
ICML 2017
Online Learning with Local Permutations and Delayed Feedback
ICML 2017
Convergence of Stochastic Gradient Descent for PCA
ICML 2016
On the Quality of the Initial Basin in Overspecified Neural Networks
ICML 2016
On the Iteration Complexity of Oblivious First-Order Optimization Algorithms
ICML 2016
Without-Replacement Sampling for Stochastic Gradient Methods
NIPS 2016
On Lower and Upper Bounds in Smooth and Strongly Convex Optimization
JMLR 2016
Dimension-Free Iteration Complexity of Finite Sum Optimization Problems
NIPS 2016
The Power of Depth for Feedforward Neural Networks
COLT 2016
Multi-Player Bandits β a Musical Chairs Approach
ICML 2016
Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity
ICML 2016
Attribute Efficient Linear Regression with Distribution-Dependent Sampling
ICML 2015
The Sample Complexity of Learning Linear Predictors with the Squared Loss
JMLR 2015
A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate
ICML 2015
Communication Complexity of Distributed Convex Learning and Optimization
NIPS 2015
Graph Approximation and Clustering on a Budget
AISTATS 2015
On the Complexity of Learning with Kernels
COLT 2015
On the Complexity of Bandit Linear Optimization
COLT 2015
Communication-Efficient Distributed Optimization using an Approximate Newton-type Method
ICML 2014
On the Computational Efficiency of Training Neural Networks
NIPS 2014
Matrix Completion with the Trace Norm: Learning, Bounding, and Transducing
JMLR 2014
Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation
NIPS 2014
On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization
COLT 2013
Online Learning with Switching Costs and Other Adaptive Adversaries
NIPS 2013
Probabilistic Label Trees for Efficient Large Scale Image Classification
CVPR 2013
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
ICML 2013
Localization and Adaptation in Online Learning
AISTATS 2013
Online Learning for Time Series Prediction
COLT 2013
Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?
COLT 2012
Learning from Weak Teachers
AISTATS 2012
Using More Data to Speed-up Training Time
AISTATS 2012
Thereβs a Hole in My Data Space: Piecewise Predictors for Heterogeneous Learning Problems
AISTATS 2012
Optimal Distributed Online Prediction Using Mini-Batches
JMLR 2012
Relax and Randomize : From Value to Algorithms
NIPS 2012
Unified Algorithms for Online Learning and Competitive Analysis
COLT 2012
Efficient Learning with Partially Observed Attributes
JMLR 2011
Spectral Clustering on a Budget
AISTATS 2011
Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing
COLT 2011
Better Mini-Batch Algorithms via Accelerated Gradient Methods
NIPS 2011
Learning with the weighted trace-norm under arbitrary sampling distributions
NIPS 2011
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
NIPS 2011
Efficient Online Learning via Randomized Rounding
NIPS 2011
From Bandits to Experts: On the Value of Side-Observations
NIPS 2011
Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity
AISTATS 2010
Learnability, Stability and Uniform Convergence
JMLR 2010
Multiclass-Multilabel Classification with More Classes than Examples
AISTATS 2010
On the Reliability of Clustering Stability in the Large Sample Regime
NIPS 2008
Cluster Stability for Finite Samples
NIPS 2007