Chulhee Yun
37 papers · 2018–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
π Academic Marathon (7) π Cross-Pollinator (6) π Conference Polyglot (4) π§ Keyword Pioneer π Interdisciplinary Bridge
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(25)
π
Triple Crown
π€
Dynamic Duo
(10)
β
The Questioner
(4)
β‘
Prolific Year
(8)
π
Conference Pioneer
π₯
Unstoppable
(8)
π
Century Club
(37)
ποΈ
Keyword Collector
(83)
Conferences
ICLR (13)
NIPS (12)
ICML (9)
COLT (3)
Top co-authors
Keywords
neural network
(5)
stochastic gradient descent
(4)
convergence rate
(3)
attention mechanism
(2)
convex optimization
(2)
transformer architecture
(2)
loss landscape
(2)
non-convex optimization
(2)
convergence analysis
(2)
gradient descent
(2)
nonconvex optimization
(1)
function approximation
(1)
natural language processing
(1)
data augmentation
(1)
neural network training
(1)
principal component analysis
(1)
sharpness-aware minimization
(1)
feature learning
(1)
universal approximation
(1)
sample complexity
(1)
Papers
Does SGD really happen in tiny subspaces?
ICLR 2025
Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
ICML 2025
Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent
ICML 2025
Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems
ICML 2025
Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty
ICML 2025
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
ICLR 2025
Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification
ICLR 2025
Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo
ICLR 2025
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
NIPS 2024
DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity
NIPS 2024
Provable Benefit of Cutout and CutMix for Feature Learning
NIPS 2024
Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements
NIPS 2024
Linear attention is (maybe) all you need (to understand Transformer optimization)
ICLR 2024
Fundamental Benefit of Alternating Updates in Minimax Optimization
ICML 2024
Provable Benefit of Mixup for Finding Optimal Decision Boundaries
ICML 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
NIPS 2023
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
NIPS 2023
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
NIPS 2023
On the Training Instability of Shuffling SGD with Batch Normalization
ICML 2023
SGDA with shuffling: faster convergence for nonconvex-PΕ minimax optimization
ICLR 2023
PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning
NIPS 2023
Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond
ICML 2023
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond
ICLR 2022
Provable Memorization via Deep Neural Networks using Sub-linear Parameters
COLT 2021
A unifying view on implicit bias in training linear neural networks
ICLR 2021
Minimum Width for Universal Approximation
ICLR 2021
Open Problem: Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?
COLT 2021
Are Transformers universal approximators of sequence-to-sequence functions?
ICLR 2020
O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
NIPS 2020
SGD with shuffling: optimal rates without component convexity and large epoch requirements
NIPS 2020
Low-Rank Bottleneck in Multi-head Attention Models
ICML 2020
Are deep ResNets provably better than linear predictors?
NIPS 2019
Small nonlinearities in activation functions create bad local minima in neural networks
ICLR 2019
Efficiently testing local optimality and escaping saddles for ReLU networks
ICLR 2019
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
NIPS 2019
Global Optimality Conditions for Deep Neural Networks
ICLR 2018
Minimax Bounds on Stochastic Batched Convex Optimization
COLT 2018