Zhiyuan Li
65 papers · 2016–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π Academic Marathon (9) π Conference Polyglot (12) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (13)
π
Cross-Pollinator
(13)
π
Renaissance Researcher
(10)
πΊοΈ
Taxonomy Completionist
(69)
π
Conference Loyalist
(22)
π§¬
Topic Evolution
π€
Dynamic Duo
(15)
π
Keyword Champion
(4)
π
Grand Slam
π
Triple Crown
π¬
Deep Specialist
(10)
π
Century Club
(63)
β‘
Prolific Year
(7)
π₯
Unstoppable
(8)
π
Trend Setter
ποΈ
Keyword Collector
(165)
β
The Questioner
(6)
Conferences
ICLR (22)
ICML (15)
NIPS (14)
AAAI (3)
ACL (2)
ICCV (2)
WACV (2)
COLT (1)
CVPR (1)
EMNLP (1)
NAACL (1)
UAI (1)
Top co-authors
Research topics
Keywords
gradient descent
(7)
weight decay
(4)
generalization bound
(4)
implicit bia
(3)
stochastic gradient descent
(3)
neural network optimization
(3)
large language model
(3)
learning rate
(3)
stochastic differential equation
(3)
edge of stability
(2)
kernel methods
(2)
approximation algorithm
(2)
multi-agent reinforcement learning
(2)
optimization problem
(2)
convolutional neural network
(2)
loss landscape
(2)
visual reasoning
(2)
batch normalization
(2)
regret bound
(2)
implicit regularization
(2)
Papers
VFCionX: Bridging Large and Small Models for Robust Vulnerability-Fixing Commit Identification
AAAI 2026
UERLens: Understanding Event Relations in Large Language Models
ACL 2026
Weak-to-Strong Generalization Even in Random Feature Networks, Provably
ICML 2025
Non-Asymptotic Length Generalization
ICML 2025
Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning
ICCV 2025
A Coefficient Makes SVRG Effective
ICLR 2025
Chain-of-Thought Provably Enables Learning the (Otherwise) Unlearnable
ICLR 2025
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
ICLR 2025
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View
ICLR 2025
Reasoning with Latent Thoughts: On the Power of Looped Transformers
ICLR 2025
Octopus: On-device language model for function calling of software APIs
NAACL 2025
Learning Progress Driven Multi-Agent Curriculum
ICML 2025
PENCIL: Long Thoughts with Short Memory
ICML 2025
AgentMixer: Multi-Agent Correlated Policy Factorization
AAAI 2025
A Theory of Learning with Autoregressive Chain of Thought
COLT 2025
Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities
ACL 2025
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
ICML 2025
The Marginal Value of Momentum for Small Learning Rate SGD
ICLR 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
ICLR 2024
Complex Organ Mask Guided Radiology Report Generation
WACV 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
ICLR 2024
Optimistic Multi-Agent Policy Gradient
ICML 2024
Implicit Bias of AdamW: $\ell_β$-Norm Constrained Optimization
ICML 2024
Backpropagation Through Agents
AAAI 2024
Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition
ICML 2024
Enhancing Advanced Visual Reasoning Ability of Large Language Models
EMNLP 2024
Simplicity Bias via Global Convergence of Sharpness Minimization
ICML 2024
Fast Equilibrium of SGD in Generic Situations
ICLR 2024
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
ICLR 2024
Sequential Latent Variable Models for Few-Shot High-Dimensional Time-Series Forecasting
ICLR 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
NIPS 2023
What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models
NIPS 2023
How Sharpness-Aware Minimization Minimizes Sharpness?
ICLR 2023
Continual Unsupervised Disentangling of Self-Organizing Representations
ICLR 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
ICML 2023
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
ICML 2023
Robust Training of Neural Networks Using Scale Invariant Architectures
ICML 2022
Understanding Gradient Descent on the Edge of Stability in Deep Learning
ICML 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
ICLR 2022
Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
NIPS 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
NIPS 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
NIPS 2022
DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection
ICCV 2021
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning
ICLR 2021
Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
ICML 2021
Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?
ICLR 2021
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
NIPS 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
NIPS 2021
When is particle filtering efficient for planning in partially observed linear dynamical systems?
UAI 2021
Regional Attention Networks With Context-Aware Fusion for Group Emotion Recognition
WACV 2021
Implicit Regularization and Convergence for Weight Normalization
NIPS 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
NIPS 2020
An Exponential Learning Rate Schedule for Deep Learning
ICLR 2020
PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS
ICLR 2020
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
ICLR 2020
Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee
ICLR 2020
The role of over-parametrization in generalization of neural networks
ICLR 2019
Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets
NIPS 2019
On Exact Computation with an Infinitely Wide Neural Net
NIPS 2019
Theoretical Analysis of Auto Rate-Tuning by Batch Normalization
ICLR 2019
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
ICML 2019
Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition
CVPR 2018
Online Improper Learning with an Approximation Oracle
NIPS 2018
Learning in Games: Robustness of Fast Convergence
NIPS 2016
Solving Marginal MAP Problems with NP Oracles and Parity Constraints
NIPS 2016