Yuanzhi Li

77 papers · 2013–2025 · 9 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🗺️ Taxonomy Completionist (19) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🐣 Hot Topic Early Bird

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (9) 🗺️ Taxonomy Completionist (19) 🏠 Conference Loyalist (25) 🏆 Keyword Champion (2) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (21) 🤝 Dynamic Duo (22) 📈 Trend Setter ⚡ Prolific Year (5) 🔥 Unstoppable (10) 💎 Century Club (77) 🗃️ Keyword Collector (52) ❓ The Questioner (5)

Conferences

NIPS (25) ICML (18) ICLR (17) COLT (12) AAAI (1) ALT (1) COLING (1) OSDI (1) UAI (1)

Top co-authors

Zeyuan Allen-Zhu (22) Sébastien Bubeck (8) Yingyu Liang (6) Quanquan Gu (5) Aarti Singh (5) Andrej Risteski (5) Dhruv Malik (5) Tengyu Ma (4) Samy Jelassi (4) Yue Wu (4)

Keywords

stochastic gradient descent (11) neural network (10) regret bound (8) gradient descent (6) representation learning (6) feature learning (6) online learning (5) multi-armed bandit (4) relu activation (4) convolutional neural network (4) neural tangent kernel (4) convex optimization (4) learning theory (3) regret minimization (3) deep neural network (3) non-convex optimization (3) sample complexity (3) computational complexity (2) matrix decomposition (2) contrastive learning (2)

Papers

Mixture of Parrots: Experts improve memorization more than reasoning ICLR 2025 LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks COLING 2025 On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics ICML 2025 Physics of Language Models: Part 3.2, Knowledge Manipulation ICLR 2025 Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws ICLR 2025 Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems ICLR 2025 Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data ICLR 2025 Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process ICLR 2025 Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning AAAI 2024 SmartPlay : A Benchmark for LLMs as Intelligent Agents ICLR 2024 Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP ICLR 2024 Physics of Language Models: Part 3.1, Knowledge Storage and Extraction ICML 2024 Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs ICLR 2024 How Does Adaptive Optimization Impact Local Neural Network Geometry? NIPS 2023 SPRING: Studying Papers and Reasoning to play Games NIPS 2023 Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions ICLR 2023 The probability flow ODE is provably fast NIPS 2023 Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning COLT 2023 Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality ICML 2023 How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding ICML 2023 The Benefits of Mixup for Feature Learning ICML 2023 Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals NIPS 2023 The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks COLT 2023 Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions ICLR 2023 Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization ICLR 2023 Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning ICLR 2023 Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs ICLR 2022 Complete Policy Regret Bounds for Tallying Bandits COLT 2022 Towards understanding how momentum improves generalization in deep learning ICML 2022 Towards Understanding the Mixture-of-Experts Layer in Deep Learning NIPS 2022 The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning NIPS 2022 Learning (Very) Simple Generative Models Is Hard NIPS 2022 Vision Transformers provably learn spatial structure NIPS 2022 LoRA: Low-Rank Adaptation of Large Language Models ICLR 2022 A heuristic for statistical seriation UAI 2021 When Is Generalizable Reinforcement Learning Tractable? NIPS 2021 Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels NIPS 2021 A Law of Robustness for Two-Layers Neural Networks COLT 2021 Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability ICLR 2021 Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity ICML 2021 Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning ICML 2021 PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections OSDI 2021 Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without COLT 2020 Learning Over-Parametrized Two-Layer Neural Networks beyond NTK COLT 2020 Improved Path-length Regret Bounds for Bandits COLT 2019 Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives COLT 2019 Near-optimal method for highly smooth convex optimization COLT 2019 Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks NIPS 2019 Can SGD Learn Recurrent Neural Networks with Provable Generalization? NIPS 2019 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers NIPS 2019 What Can ResNet Learn Efficiently, Going Beyond Kernels? NIPS 2019 Complexity of Highly Parallel Non-Smooth Convex Optimization NIPS 2019 A Convergence Theory for Deep Learning via Over-Parameterization ICML 2019 On the Convergence Rate of Training Recurrent Neural Networks NIPS 2019 Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees ICLR 2019 Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits ICML 2018 The Well-Tempered Lasso ICML 2018 Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data NIPS 2018 Sparsity, variance and curvature in multi-armed bandits ALT 2018 NEON2: Finding Local Minima via First-Order Oracles NIPS 2018 Online Improper Learning with an Approximation Oracle NIPS 2018 Learning Mixtures of Linear Regressions with Nearly Optimal Complexity COLT 2018 Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations COLT 2018 An Alternative View: When Does SGD Escape Local Minima? ICML 2018 Faster Principal Component Regression and Stable Matrix Chebyshev Approximation ICML 2017 Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition ICML 2017 Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU ICML 2017 Near-Optimal Design of Experiments via Regret Minimization ICML 2017 Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations ICML 2017 Convergence Analysis of Two-layer Neural Networks with ReLU Activation NIPS 2017 Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls NIPS 2017 Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates NIPS 2016 LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain NIPS 2016 Algorithms and matching lower bounds for approximately-convex optimization NIPS 2016 Recovery guarantee of weighted low-rank approximation via alternating minimization ICML 2016 Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods NIPS 2016 A Theoretical Analysis of NDCG Type Ranking Measures COLT 2013