Kaifeng Lyu

22 papers · 2019–2025 · 3 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🐝 Cross-Pollinator (12) 🏃 Academic Marathon (6) 🧭 Keyword Pioneer 🌍 Conference Polyglot (3) 🌉 Interdisciplinary Bridge

🗺️ Taxonomy Completionist (18) 🧭 Keyword Pioneer 🤝 Dynamic Duo (10) ❓ The Questioner ⚡ Prolific Year (7) 💎 Century Club (22) 🔥 Unstoppable (7)

Conferences

ICLR (14) NIPS (6) ICML (2)

Top co-authors

Sanjeev Arora (10) Zhiyuan Li (9) Runzhe Wang (3) Dingli Yu (3) Xinran Gu (3) Sadhika Malladi (2) Longbo Huang (2) Sanjiv Kumar (2) Simon Shaolei Du (2) Jian Li (2)

Keywords

gradient descent (4) batch normalization (2) neural network optimization (2) stochastic differential equation (2) weight decay (2) safety alignment (1) incremental learning (1) margin maximization (1) distribution shift (1) low-rank recovery (1) low-rank representation (1) linear classifier (1) learning rate (1) matrix sensing (1) implicit bia (1) intrinsic learning rate (1) completeness soundness (1) adaptive gradient method (1) normalization layer (1) layer normalization (1)

Papers

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval ICLR 2025 A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules ICLR 2025 Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias ICLR 2025 Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks ICLR 2025 Safety Alignment Should be Made More Than Just a Few Tokens Deep ICLR 2025 Weak-to-Strong Generalization Even in Random Feature Networks, Provably ICML 2025 Efficient stagewise pretraining via progressive subnetworks ICLR 2025 Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking ICLR 2024 Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates NIPS 2024 A Quadratic Synchronization Rule for Distributed Deep Learning ICLR 2024 The Marginal Value of Momentum for Small Learning Rate SGD ICLR 2024 DistillSpec: Improving Speculative Decoding via Knowledge Distillation ICLR 2024 Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing ICML 2023 Why (and When) does Local SGD Generalize Better than SGD? ICLR 2023 Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction NIPS 2022 New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound NIPS 2022 On the SDEs and Scaling Rules for Adaptive Gradient Algorithms NIPS 2022 Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning ICLR 2021 Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias NIPS 2021 Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate NIPS 2020 Gradient Descent Maximizes the Margin of Homogeneous Neural Networks ICLR 2020 Theoretical Analysis of Auto Rate-Tuning by Batch Normalization ICLR 2019