Zeyuan Allen-Zhu

37 papers · 2016–2025 · 5 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (13) 🌍 Conference Polyglot (5)

🌈 Renaissance Researcher (6) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🐺 Lone Wolf (5) 🤝 Dynamic Duo (22) 🏆 Keyword Champion (2) 🔬 Deep Specialist (15) 💎 Century Club (37) ⚡ Prolific Year (5) ❓ The Questioner (3) 🗃️ Keyword Collector (118) 📈 Trend Setter 🔥 Unstoppable (5)

Conferences

NIPS (15) ICML (12) ICLR (8) COLT (1) JMLR (1)

Top co-authors

Yuanzhi Li (22) Yang Yuan (3) Elad Hazan (3) Jerry Li (2) Tian Ye (2) Sébastien Bubeck (2) Dan Alistarh (2) Zicheng Xu (2) Zhao Song (2) Yining Wang (1)

Keywords

stochastic gradient descent (14) non-convex optimization (7) variance reduction (5) convex optimization (4) stochastic optimization (4) regret bound (3) convergence rate (3) empirical risk minimization (3) neural tangent kernel (2) learning theory (2) recurrent neural network (2) feature learning (2) regret minimization (2) online algorithm (2) gradient descent (2) hierarchical learning (2) online learning (2) singular value decomposition (2) stochastic gradient (2) deep neural network (2)

Papers

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process ICLR 2025 Physics of Language Models: Part 3.2, Knowledge Manipulation ICLR 2025 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws ICLR 2025 Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems ICLR 2025 Physics of Language Models: Part 3.1, Knowledge Storage and Extraction ICML 2024 Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning COLT 2023 SALSA VERDE: a machine learning attack on LWE with sparse small secrets NIPS 2023 Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions ICLR 2023 Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning ICLR 2023 LoRA: Low-Rank Adaptation of Large Language Models ICLR 2022 Byzantine-Resilient Non-Convex Stochastic Gradient Descent ICLR 2021 What Can ResNet Learn Efficiently, Going Beyond Kernels? NIPS 2019 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers NIPS 2019 Can SGD Learn Recurrent Neural Networks with Provable Generalization? NIPS 2019 On the Convergence Rate of Training Recurrent Neural Networks NIPS 2019 A Convergence Theory for Deep Learning via Over-Parameterization ICML 2019 Katyusha: The First Direct Acceleration of Stochastic Gradient Methods JMLR 2018 Natasha 2: Faster Non-Convex Optimization Than SGD NIPS 2018 How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD NIPS 2018 Byzantine Stochastic Gradient Descent NIPS 2018 The Lingering of Gradients: How to Reuse Gradients Over Time NIPS 2018 Is Q-Learning Provably Efficient? NIPS 2018 NEON2: Finding Local Minima via First-Order Oracles NIPS 2018 Katyusha X: Simple Momentum Method for Stochastic Sum-of-Nonconvex Optimization ICML 2018 Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits ICML 2018 Near-Optimal Design of Experiments via Regret Minimization ICML 2017 Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter ICML 2017 Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition ICML 2017 Faster Principal Component Regression and Stable Matrix Chebyshev Approximation ICML 2017 Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls NIPS 2017 Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU ICML 2017 Variance Reduction for Faster Non-Convex Optimization ICML 2016 Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling ICML 2016 Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters NIPS 2016 Optimal Black-Box Reductions Between Optimization Objectives NIPS 2016 LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain NIPS 2016 Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives ICML 2016