Zhiyuan Li

65 papers · 2016–2026 · 12 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🏃 Academic Marathon (9) 🌍 Conference Polyglot (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)

🐝 Cross-Pollinator (13) 🌈 Renaissance Researcher (10) 🗺️ Taxonomy Completionist (69) 🏠 Conference Loyalist (22) 🧬 Topic Evolution 🤝 Dynamic Duo (15) 🏆 Keyword Champion (4) 🏆 Grand Slam 👑 Triple Crown 🔬 Deep Specialist (10) 💎 Century Club (63) ⚡ Prolific Year (7) 🔥 Unstoppable (8) 📈 Trend Setter 🗃️ Keyword Collector (165) ❓ The Questioner (6)

Conferences

ICLR (22) ICML (15) NIPS (14) AAAI (3) ACL (2) ICCV (2) WACV (2) COLT (1) CVPR (1) EMNLP (1) NAACL (1) UAI (1)

Top co-authors

Sanjeev Arora (15) Kaifeng Lyu (9) Tengyu Ma (8) Wei Hu (7) Tianhao Wang (5) Wenshuai Zhao (4) Joni Pajarinen (4) Dingli Yu (4) Nathan Srebro (4) Dongnan Liu (3)

Research topics

Privacy (1)

Keywords

gradient descent (7) weight decay (4) generalization bound (4) implicit bia (3) stochastic gradient descent (3) neural network optimization (3) large language model (3) learning rate (3) stochastic differential equation (3) edge of stability (2) kernel methods (2) approximation algorithm (2) multi-agent reinforcement learning (2) optimization problem (2) convolutional neural network (2) loss landscape (2) visual reasoning (2) batch normalization (2) regret bound (2) implicit regularization (2)

Papers

VFCionX: Bridging Large and Small Models for Robust Vulnerability-Fixing Commit Identification AAAI 2026 UERLens: Understanding Event Relations in Large Language Models ACL 2026 Weak-to-Strong Generalization Even in Random Feature Networks, Provably ICML 2025 Non-Asymptotic Length Generalization ICML 2025 Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning ICCV 2025 A Coefficient Makes SVRG Effective ICLR 2025 Chain-of-Thought Provably Enables Learning the (Otherwise) Unlearnable ICLR 2025 Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity ICLR 2025 Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View ICLR 2025 Reasoning with Latent Thoughts: On the Power of Looped Transformers ICLR 2025 Octopus: On-device language model for function calling of software APIs NAACL 2025 Learning Progress Driven Multi-Agent Curriculum ICML 2025 PENCIL: Long Thoughts with Short Memory ICML 2025 AgentMixer: Multi-Agent Correlated Policy Factorization AAAI 2025 A Theory of Learning with Autoregressive Chain of Thought COLT 2025 Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities ACL 2025 Structured Preconditioners in Adaptive Optimization: A Unified Analysis ICML 2025 The Marginal Value of Momentum for Small Learning Rate SGD ICLR 2024 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems ICLR 2024 Complex Organ Mask Guided Radiology Report Generation WACV 2024 Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking ICLR 2024 Optimistic Multi-Agent Policy Gradient ICML 2024 Implicit Bias of AdamW: $\ell_∞$-Norm Constrained Optimization ICML 2024 Backpropagation Through Agents AAAI 2024 Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition ICML 2024 Enhancing Advanced Visual Reasoning Ability of Large Language Models EMNLP 2024 Simplicity Bias via Global Convergence of Sharpness Minimization ICML 2024 Fast Equilibrium of SGD in Generic Situations ICLR 2024 Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training ICLR 2024 Sequential Latent Variable Models for Few-Shot High-Dimensional Time-Series Forecasting ICLR 2023 Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization NIPS 2023 What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models NIPS 2023 How Sharpness-Aware Minimization Minimizes Sharpness? ICLR 2023 Continual Unsupervised Disentangling of Self-Organizing Representations ICLR 2023 Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing ICML 2023 Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models ICML 2023 Robust Training of Neural Networks Using Scale Invariant Architectures ICML 2022 Understanding Gradient Descent on the Edge of Stability in Deep Learning ICML 2022 What Happens after SGD Reaches Zero Loss? --A Mathematical Framework ICLR 2022 Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay NIPS 2022 Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent NIPS 2022 Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction NIPS 2022 DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection ICCV 2021 Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning ICLR 2021 Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning ICML 2021 Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets? ICLR 2021 Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias NIPS 2021 On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) NIPS 2021 When is particle filtering efficient for planning in partially observed linear dynamical systems? UAI 2021 Regional Attention Networks With Context-Aware Fusion for Group Emotion Recognition WACV 2021 Implicit Regularization and Convergence for Weight Normalization NIPS 2020 Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate NIPS 2020 An Exponential Learning Rate Schedule for Deep Learning ICLR 2020 PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS ICLR 2020 Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks ICLR 2020 Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee ICLR 2020 The role of over-parametrization in generalization of neural networks ICLR 2019 Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets NIPS 2019 On Exact Computation with an Infinitely Wide Neural Net NIPS 2019 Theoretical Analysis of Auto Rate-Tuning by Batch Normalization ICLR 2019 Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks ICML 2019 Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition CVPR 2018 Online Improper Learning with an Approximation Oracle NIPS 2018 Learning in Games: Robustness of Fast Convergence NIPS 2016 Solving Marginal MAP Problems with NP Oracles and Parity Constraints NIPS 2016