Sham M. Kakade

68 papers · 2007–2025 · 5 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (33) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🌟 Keyword Trendsetter Combo (3) 🏠 Conference Loyalist (24) 👑 Triple Crown 🏆 Keyword Champion (2) 🌱 Topic Pioneer 🔬 Deep Specialist (14) 🗃️ Keyword Collector (132) ⚡ Prolific Year (8) 🚀 Conference Pioneer 📈 Trend Setter ❓ The Questioner (5) 💎 Century Club (68) 🔥 Unstoppable (19)

Conferences

NIPS (24) ICLR (18) ICML (10) COLT (8) JMLR (8)

Top co-authors

Praneeth Netrapalli (9) David Brandfonbrener (7) Nikhil Vyas (7) Daniel J. Hsu (7) Rong Ge (6) Depen Morwani (6) Eran Malach (5) Samy Jelassi (5) Rahul Kidambi (5) Prateek Jain (5)

Research topics

Privacy (1)

Keywords

stochastic gradient descent (7) parameter estimation (5) spectral method (5) gradient descent (5) regret bound (5) sample complexity (3) online learning (3) neural network optimization (3) excess risk (3) computational complexity (3) online optimization (3) unsupervised learning (3) strongly convex (3) learning theory (3) linear regression (3) convex optimization (3) compressed sensing (2) non-convex optimization (2) multi-agent reinforcement learning (2) nash equilibrium (2)

Papers

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models ICLR 2025 A New Perspective on Shampoo's Preconditioner ICLR 2025 Eliminating Position Bias of Language Models: A Mechanistic Approach ICLR 2025 How Does Critical Batch Size Scale in Pre-training? ICLR 2025 Universal Length Generalization with Turing Programs ICML 2025 The Role of Sparsity for Length Generalization in LLMs ICML 2025 Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems ICLR 2025 SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling ICLR 2025 Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions ICML 2025 Mixture of Parrots: Experts improve memorization more than reasoning ICLR 2025 Deconstructing What Makes a Good Optimizer for Autoregressive Language Models ICLR 2025 Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond ICLR 2025 Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent NIPS 2024 Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning ICML 2024 Q-Probe: A Lightweight Approach to Reward Maximization for Language Models ICML 2024 Repeat After Me: Transformers are Better than State Space Models at Copying ICML 2024 Feature emergence via margin maximization: case studies in algebraic tasks ICLR 2024 Scaling Laws in Linear Regression: Compute, Parameters, and Data NIPS 2024 Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron ICML 2023 Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity JMLR 2023 Benign Overfitting of Constant-Stepsize SGD for Linear Regression JMLR 2023 The Role of Coverage in Online Reinforcement Learning ICLR 2023 Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games ICML 2023 On Provable Copyright Protection for Generative Models ICML 2023 Multi-Stage Episodic Control for Strategic Exploration in Text Games ICLR 2022 Anti-Concentrated Confidence Bonuses For Scalable Exploration ICLR 2022 Optimal Regularization can Mitigate Double Descent ICLR 2021 Few-Shot Learning via Learning the Representation, Provably ICLR 2021 What are the Statistical Limits of Offline RL with Linear Function Approximation? ICLR 2021 On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift JMLR 2021 Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? ICLR 2020 Open Problem: Do Good Algorithms Necessarily Query Bad Points? COLT 2019 Meta-Learning with Implicit Gradients NIPS 2019 The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares NIPS 2019 A Smoother Way to Train Structured Prediction Models NIPS 2018 On the insufficiency of existing momentum schemes for Stochastic Optimization ICLR 2018 Accelerating Stochastic Gradient Descent for Least Squares Regression COLT 2018 Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification JMLR 2018 Provably Correct Automatic Sub-Differentiation for Qualified Programs NIPS 2018 How to Escape Saddle Points Efficiently ICML 2017 Learning Overcomplete HMMs NIPS 2017 Towards Generalization and Simplicity in Continuous Control NIPS 2017 Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja’s Algorithm COLT 2016 Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent NIPS 2016 Competing with the Empirical Risk Minimizer in a Single Pass COLT 2015 Convergence Rates of Active Learning for Maximum Likelihood Estimation NIPS 2015 Super-Resolution Off the Grid NIPS 2015 A Tensor Approach to Learning Mixed Membership Community Models JMLR 2014 Tensor Decompositions for Learning Latent Variable Models JMLR 2014 When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity NIPS 2013 A Risk Comparison of Ordinary Least Squares vs Ridge Regression JMLR 2013 Random Design Analysis of Ridge Regression COLT 2012 A Method of Moments for Mixture Models and Hidden Markov Models COLT 2012 Identifiability and Unmixing of Latent Parse Trees NIPS 2012 A Spectral Algorithm for Latent Dirichlet Allocation NIPS 2012 Learning Mixtures of Tree Graphical Models NIPS 2012 Regularization Techniques for Learning with Matrices JMLR 2012 Towards Minimax Policies for Online Linear Optimization with Bandit Feedback COLT 2012 (weak) Calibration is Computationally Hard COLT 2012 Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression NIPS 2011 Spectral Methods for Learning Multivariate Latent Tree Structure NIPS 2011 Stochastic convex optimization with bandit feedback NIPS 2011 Learning from Logged Implicit Exploration Data NIPS 2010 Multi-Label Prediction via Compressed Sensing NIPS 2009 Mind the Duality Gap: Logarithmic regret algorithms for online optimization NIPS 2008 On the Generalization Ability of Online Strongly Convex Programming Algorithms NIPS 2008 On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization NIPS 2008 The Price of Bandit Information for Online Optimization NIPS 2007