Yuanzhi Li
77 papers · 2013–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (19) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (7) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
π
Conference Polyglot
(9)
πΊοΈ
Taxonomy Completionist
(19)
π
Conference Loyalist
(25)
π
Keyword Champion
(2)
π
Triple Crown
π
Grand Slam
π¬
Deep Specialist
(21)
π€
Dynamic Duo
(22)
π
Trend Setter
β‘
Prolific Year
(5)
π₯
Unstoppable
(10)
π
Century Club
(77)
ποΈ
Keyword Collector
(52)
β
The Questioner
(5)
Conferences
NIPS (25)
ICML (18)
ICLR (17)
COLT (12)
AAAI (1)
ALT (1)
COLING (1)
OSDI (1)
UAI (1)
Top co-authors
Keywords
stochastic gradient descent
(11)
neural network
(10)
regret bound
(8)
gradient descent
(6)
representation learning
(6)
feature learning
(6)
online learning
(5)
multi-armed bandit
(4)
relu activation
(4)
convolutional neural network
(4)
neural tangent kernel
(4)
convex optimization
(4)
learning theory
(3)
regret minimization
(3)
deep neural network
(3)
non-convex optimization
(3)
sample complexity
(3)
computational complexity
(2)
matrix decomposition
(2)
contrastive learning
(2)
Papers
Mixture of Parrots: Experts improve memorization more than reasoning
ICLR 2025
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
COLING 2025
On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics
ICML 2025
Physics of Language Models: Part 3.2, Knowledge Manipulation
ICLR 2025
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
ICLR 2025
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
ICLR 2025
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
ICLR 2025
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
ICLR 2025
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
AAAI 2024
SmartPlay : A Benchmark for LLMs as Intelligent Agents
ICLR 2024
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP
ICLR 2024
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
ICML 2024
Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs
ICLR 2024
How Does Adaptive Optimization Impact Local Neural Network Geometry?
NIPS 2023
SPRING: Studying Papers and Reasoning to play Games
NIPS 2023
Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
ICLR 2023
The probability flow ODE is provably fast
NIPS 2023
Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning
COLT 2023
Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality
ICML 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
ICML 2023
The Benefits of Mixup for Feature Learning
ICML 2023
Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals
NIPS 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
COLT 2023
Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions
ICLR 2023
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
ICLR 2023
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
ICLR 2023
Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs
ICLR 2022
Complete Policy Regret Bounds for Tallying Bandits
COLT 2022
Towards understanding how momentum improves generalization in deep learning
ICML 2022
Towards Understanding the Mixture-of-Experts Layer in Deep Learning
NIPS 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
NIPS 2022
Learning (Very) Simple Generative Models Is Hard
NIPS 2022
Vision Transformers provably learn spatial structure
NIPS 2022
LoRA: Low-Rank Adaptation of Large Language Models
ICLR 2022
A heuristic for statistical seriation
UAI 2021
When Is Generalizable Reinforcement Learning Tractable?
NIPS 2021
Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels
NIPS 2021
A Law of Robustness for Two-Layers Neural Networks
COLT 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
ICLR 2021
Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
ICML 2021
Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning
ICML 2021
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
OSDI 2021
Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
COLT 2020
Learning Over-Parametrized Two-Layer Neural Networks beyond NTK
COLT 2020
Improved Path-length Regret Bounds for Bandits
COLT 2019
Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives
COLT 2019
Near-optimal method for highly smooth convex optimization
COLT 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
NIPS 2019
Can SGD Learn Recurrent Neural Networks with Provable Generalization?
NIPS 2019
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
NIPS 2019
What Can ResNet Learn Efficiently, Going Beyond Kernels?
NIPS 2019
Complexity of Highly Parallel Non-Smooth Convex Optimization
NIPS 2019
A Convergence Theory for Deep Learning via Over-Parameterization
ICML 2019
On the Convergence Rate of Training Recurrent Neural Networks
NIPS 2019
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
ICLR 2019
Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits
ICML 2018
The Well-Tempered Lasso
ICML 2018
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
NIPS 2018
Sparsity, variance and curvature in multi-armed bandits
ALT 2018
NEON2: Finding Local Minima via First-Order Oracles
NIPS 2018
Online Improper Learning with an Approximation Oracle
NIPS 2018
Learning Mixtures of Linear Regressions with Nearly Optimal Complexity
COLT 2018
Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations
COLT 2018
An Alternative View: When Does SGD Escape Local Minima?
ICML 2018
Faster Principal Component Regression and Stable Matrix Chebyshev Approximation
ICML 2017
Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition
ICML 2017
Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU
ICML 2017
Near-Optimal Design of Experiments via Regret Minimization
ICML 2017
Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations
ICML 2017
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
NIPS 2017
Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls
NIPS 2017
Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates
NIPS 2016
LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain
NIPS 2016
Algorithms and matching lower bounds for approximately-convex optimization
NIPS 2016
Recovery guarantee of weighted low-rank approximation via alternating minimization
ICML 2016
Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods
NIPS 2016
A Theoretical Analysis of NDCG Type Ranking Measures
COLT 2013