Ohad Shamir

102 papers · 2007–2025 · 9 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (31) 🌈 Renaissance Researcher (5) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (8) 🏠 Conference Loyalist (29) 🐺 Lone Wolf (16) 🌟 Keyword Trendsetter Combo (11) 🤝 Dynamic Duo (11) 🔬 Deep Specialist (11) 🏆 Keyword Champion (4) 🌱 Topic Pioneer 🗃️ Keyword Collector (152) 📈 Trend Setter 🔥 Unstoppable (16) 🚀 Conference Pioneer ⚡ Prolific Year (6) ❓ The Questioner (5) 💎 Century Club (102)

Conferences

NIPS (29) COLT (28) ICML (19) JMLR (12) AISTATS (8) ALT (3) CVPR (1) ICLR (1) IJCAI (1)

Top co-authors

Gilad Yehudai (11) Gal Vardi (11) Nathan Srebro (9) Shai Shalev-shwartz (8) Guy Kornowski (7) Itay Safran (6) Yossi Arjevani (6) Nicolò Cesa-bianchi (5) Karthik Sridharan (5) Nati Srebro (5)

Keywords

convex optimization (15) gradient descent (12) stochastic gradient descent (11) relu network (10) sample complexity (10) regret bound (9) neural network (9) online learning (8) neural network optimization (7) stochastic optimization (7) lower bound (7) non-convex optimization (6) learning theory (5) distributed optimization (5) matrix completion (5) stochastic gradient (5) implicit bia (5) optimization theory (4) nonsmooth optimization (4) principal component analysis (4)

Papers

The Oracle Complexity of Simplex-based Matrix Games: Linear Separability and Nash Equilibria COLT 2025 Logarithmic Width Suffices for Robust Memorization COLT 2025 Open Problem: Anytime Convergence Rate of Gradient Descent COLT 2024 Depth Separation in Norm-Bounded Infinite-Width Neural Networks COLT 2024 An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization JMLR 2024 Generalization in Kernel Regression Under Realistic Assumptions ICML 2024 From Tempered to Benign Overfitting in ReLU Neural Networks NIPS 2023 The Implicit Bias of Benign Overfitting JMLR 2023 Deterministic Nonsmooth Nonconvex Optimization COLT 2023 Implicit Regularization Towards Rank Minimization in ReLU Networks ALT 2023 Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance NIPS 2023 Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks NIPS 2023 The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication (Extended Abstract) IJCAI 2022 The Sample Complexity of One-Hidden-Layer Neural Networks NIPS 2022 Gradient Methods Provably Converge to Non-Robust Networks NIPS 2022 Reconstructing Training Data From Trained Neural Networks NIPS 2022 On Margin Maximization in Linear and ReLU Networks NIPS 2022 The Implicit Bias of Benign Overfitting COLT 2022 Width is Less Important than Depth in ReLU Neural Networks COLT 2022 On the Optimal Memorization Power of ReLU Neural Networks ICLR 2022 Oracle Complexity in Nonsmooth Nonconvex Optimization JMLR 2022 Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems NIPS 2021 A Stochastic Newton Algorithm for Distributed Convex Optimization NIPS 2021 Oracle Complexity in Nonsmooth Nonconvex Optimization NIPS 2021 Learning a Single Neuron with Bias Using Gradient Descent NIPS 2021 Gradient Methods Never Overfit On Separable Data JMLR 2021 The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication COLT 2021 Implicit Regularization in ReLU Networks with the Square Loss COLT 2021 Size and Depth Separation in Approximating Benign Functions with Neural Networks COLT 2021 The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks COLT 2021 The Connection Between Approximation, Depth Separation and Learnability in Neural Networks COLT 2021 Is Local SGD Better than Minibatch SGD? ICML 2020 A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates ALT 2020 How Good is SGD with Random Shuffling? COLT 2020 Neural Networks with Small Weights and Depth-Separation Barriers NIPS 2020 The Complexity of Finding Stationary Points with Stochastic Gradient Descent ICML 2020 Proving the Lottery Ticket Hypothesis: Pruning is All You Need ICML 2020 Space lower bounds for linear prediction in the streaming model COLT 2019 Depth Separations in Neural Networks: What is Actually Being Separated? COLT 2019 On the Power and Limitations of Random Features for Understanding Neural Networks NIPS 2019 The Complexity of Making the Gradient Small in Stochastic Convex Optimization COLT 2019 Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks COLT 2019 Are ResNets Provably Better than Linear Predictors? NIPS 2018 Spurious Local Minima are Common in Two-Layer ReLU Neural Networks ICML 2018 Size-Independent Sample Complexity of Neural Networks COLT 2018 Bandit Regret Scaling with the Effective Loss Range ALT 2018 Detecting Correlations with Little Memory and Communication COLT 2018 Distribution-Specific Hardness of Learning Neural Networks JMLR 2018 Global Non-convex Optimization with Discretized Diffusions NIPS 2018 Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks ICML 2017 An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback JMLR 2017 Preface: Conference on Learning Theory (COLT), 2017 COLT 2017 Oracle Complexity of Second-Order Methods for Finite-Sum Problems ICML 2017 Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis ICML 2017 Failures of Gradient-Based Deep Learning ICML 2017 Online Learning with Local Permutations and Delayed Feedback ICML 2017 Convergence of Stochastic Gradient Descent for PCA ICML 2016 On the Quality of the Initial Basin in Overspecified Neural Networks ICML 2016 On the Iteration Complexity of Oblivious First-Order Optimization Algorithms ICML 2016 Without-Replacement Sampling for Stochastic Gradient Methods NIPS 2016 On Lower and Upper Bounds in Smooth and Strongly Convex Optimization JMLR 2016 Dimension-Free Iteration Complexity of Finite Sum Optimization Problems NIPS 2016 The Power of Depth for Feedforward Neural Networks COLT 2016 Multi-Player Bandits – a Musical Chairs Approach ICML 2016 Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity ICML 2016 Attribute Efficient Linear Regression with Distribution-Dependent Sampling ICML 2015 The Sample Complexity of Learning Linear Predictors with the Squared Loss JMLR 2015 A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate ICML 2015 Communication Complexity of Distributed Convex Learning and Optimization NIPS 2015 Graph Approximation and Clustering on a Budget AISTATS 2015 On the Complexity of Learning with Kernels COLT 2015 On the Complexity of Bandit Linear Optimization COLT 2015 Communication-Efficient Distributed Optimization using an Approximate Newton-type Method ICML 2014 On the Computational Efficiency of Training Neural Networks NIPS 2014 Matrix Completion with the Trace Norm: Learning, Bounding, and Transducing JMLR 2014 Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation NIPS 2014 On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization COLT 2013 Online Learning with Switching Costs and Other Adaptive Adversaries NIPS 2013 Probabilistic Label Trees for Efficient Large Scale Image Classification CVPR 2013 Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes ICML 2013 Localization and Adaptation in Online Learning AISTATS 2013 Online Learning for Time Series Prediction COLT 2013 Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent? COLT 2012 Learning from Weak Teachers AISTATS 2012 Using More Data to Speed-up Training Time AISTATS 2012 There’s a Hole in My Data Space: Piecewise Predictors for Heterogeneous Learning Problems AISTATS 2012 Optimal Distributed Online Prediction Using Mini-Batches JMLR 2012 Relax and Randomize : From Value to Algorithms NIPS 2012 Unified Algorithms for Online Learning and Competitive Analysis COLT 2012 Efficient Learning with Partially Observed Attributes JMLR 2011 Spectral Clustering on a Budget AISTATS 2011 Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing COLT 2011 Better Mini-Batch Algorithms via Accelerated Gradient Methods NIPS 2011 Learning with the weighted trace-norm under arbitrary sampling distributions NIPS 2011 Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression NIPS 2011 Efficient Online Learning via Randomized Rounding NIPS 2011 From Bandits to Experts: On the Value of Side-Observations NIPS 2011 Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity AISTATS 2010 Learnability, Stability and Uniform Convergence JMLR 2010 Multiclass-Multilabel Classification with More Classes than Examples AISTATS 2010 On the Reliability of Clustering Stability in the Large Sample Regime NIPS 2008 Cluster Stability for Finite Samples NIPS 2007