Kaifeng Lyu
22 papers · 2019–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+7 more ↓ Show less ↑
π Cross-Pollinator (12) π Academic Marathon (6) π§ Keyword Pioneer π Conference Polyglot (3) π Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(18)
π§
Keyword Pioneer
π€
Dynamic Duo
(10)
β
The Questioner
β‘
Prolific Year
(7)
π
Century Club
(22)
π₯
Unstoppable
(7)
Conferences
ICLR (14)
NIPS (6)
ICML (2)
Top co-authors
Keywords
gradient descent
(4)
batch normalization
(2)
neural network optimization
(2)
stochastic differential equation
(2)
weight decay
(2)
safety alignment
(1)
incremental learning
(1)
margin maximization
(1)
distribution shift
(1)
low-rank recovery
(1)
low-rank representation
(1)
linear classifier
(1)
learning rate
(1)
matrix sensing
(1)
implicit bia
(1)
intrinsic learning rate
(1)
completeness soundness
(1)
adaptive gradient method
(1)
normalization layer
(1)
layer normalization
(1)
Papers
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval
ICLR 2025
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
ICLR 2025
Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
ICLR 2025
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
ICLR 2025
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025
Weak-to-Strong Generalization Even in Random Feature Networks, Provably
ICML 2025
Efficient stagewise pretraining via progressive subnetworks
ICLR 2025
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
ICLR 2024
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
NIPS 2024
A Quadratic Synchronization Rule for Distributed Deep Learning
ICLR 2024
The Marginal Value of Momentum for Small Learning Rate SGD
ICLR 2024
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
ICLR 2024
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
ICML 2023
Why (and When) does Local SGD Generalize Better than SGD?
ICLR 2023
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
NIPS 2022
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
NIPS 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
NIPS 2022
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning
ICLR 2021
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
NIPS 2021
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
NIPS 2020
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
ICLR 2020
Theoretical Analysis of Auto Rate-Tuning by Batch Normalization
ICLR 2019