Chulhee Yun

37 papers · 2018–2025 · 4 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🏃 Academic Marathon (7) 🐝 Cross-Pollinator (6) 🌍 Conference Polyglot (4) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (25) 👑 Triple Crown 🤝 Dynamic Duo (10) ❓ The Questioner (4) ⚡ Prolific Year (8) 🚀 Conference Pioneer 🔥 Unstoppable (8) 💎 Century Club (37) 🗃️ Keyword Collector (83)

Conferences

ICLR (13) NIPS (12) ICML (9) COLT (3)

Top co-authors

Suvrit Sra (10) Hanseul Cho (8) Ali Jadbabaie (7) Srinadh Bhojanapalli (5) Jaeyoung Cha (4) Minhak Song (4) Jaewook Lee (3) Sanjiv Kumar (3) Kwangjun Ahn (3) Sashank Reddi (3)

Keywords

neural network (5) stochastic gradient descent (4) convergence rate (3) attention mechanism (2) convex optimization (2) transformer architecture (2) loss landscape (2) non-convex optimization (2) convergence analysis (2) gradient descent (2) nonconvex optimization (1) function approximation (1) natural language processing (1) data augmentation (1) neural network training (1) principal component analysis (1) sharpness-aware minimization (1) feature learning (1) universal approximation (1) sample complexity (1)

Papers

Does SGD really happen in tiny subspaces? ICLR 2025 Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More ICML 2025 Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent ICML 2025 Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems ICML 2025 Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty ICML 2025 Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count ICLR 2025 Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification ICLR 2025 Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo ICLR 2025 Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure NIPS 2024 DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity NIPS 2024 Provable Benefit of Cutout and CutMix for Feature Learning NIPS 2024 Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements NIPS 2024 Linear attention is (maybe) all you need (to understand Transformer optimization) ICLR 2024 Fundamental Benefit of Alternating Updates in Minimax Optimization ICML 2024 Provable Benefit of Mixup for Finding Optimal Decision Boundaries ICML 2023 Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima NIPS 2023 Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint NIPS 2023 Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory NIPS 2023 On the Training Instability of Shuffling SGD with Batch Normalization ICML 2023 SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization ICLR 2023 PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning NIPS 2023 Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond ICML 2023 Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond ICLR 2022 Provable Memorization via Deep Neural Networks using Sub-linear Parameters COLT 2021 A unifying view on implicit bias in training linear neural networks ICLR 2021 Minimum Width for Universal Approximation ICLR 2021 Open Problem: Can Single-Shuffle SGD be Better than Reshuffling SGD and GD? COLT 2021 Are Transformers universal approximators of sequence-to-sequence functions? ICLR 2020 O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers NIPS 2020 SGD with shuffling: optimal rates without component convexity and large epoch requirements NIPS 2020 Low-Rank Bottleneck in Multi-head Attention Models ICML 2020 Are deep ResNets provably better than linear predictors? NIPS 2019 Small nonlinearities in activation functions create bad local minima in neural networks ICLR 2019 Efficiently testing local optimality and escaping saddles for ReLU networks ICLR 2019 Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity NIPS 2019 Global Optimality Conditions for Deep Neural Networks ICLR 2018 Minimax Bounds on Stochastic Batched Convex Optimization COLT 2018