Jason Lee

87 papers · 2010–2025 · 13 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🗺️ Taxonomy Completionist (24) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🐣 Hot Topic Early Bird

🏃 Academic Marathon (15) 🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (44) 🌟 Keyword Trendsetter Combo (5) 🤝 Dynamic Duo (10) 👑 Triple Crown 🏆 Keyword Champion 🏆 Grand Slam 🔬 Deep Specialist (29) ❓ The Questioner 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (14) ⚡ Prolific Year (11) 💎 Century Club (87) 🗃️ Keyword Collector (89)

Conferences

NIPS (44) ICML (13) AISTATS (9) COLT (7) EMNLP (5) ACL (2) AAAI (1) AACL (1) CORL (1) EACL (1) ICLR (1) IJCNLP (1) NAACL (1)

Top co-authors

Qi Lei (10) Kyunghyun Cho (7) Simon Du (6) Daniel Soudry (5) Alex Damian (5) Tengyu Ma (5) Bruce W. Lee (5) Suriya Gunasekar (5) Tianle Cai (4) Eshaan Nichani (4)

Research topics

Statistics (3) Applications (1) Education (1)

Keywords

gradient descent (21) sample complexity (11) neural network (10) representation learning (7) implicit bia (7) learning theory (6) neural tangent kernel (5) non-convex optimization (5) gradient flow (5) stochastic gradient descent (5) kernel methods (5) neural network optimization (4) function approximation (4) reinforcement learning (3) convergence guarantee (3) generalization bound (3) convex optimization (3) latent variable model (3) convergence rate (3) few-shot learning (3)

Papers

Anytime Acceleration of Gradient Descent COLT 2025 BranchOut: Capturing Realistic Multimodality in Autonomous Driving Decisions CORL 2025 REST: Retrieval-Based Speculative Decoding NAACL 2024 MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encoding NIPS 2024 Computational-Statistical Gaps in Gaussian Single-Index Models (Extended Abstract) COLT 2024 A Side-by-side Comparison of Transformers for Implicit Discourse Relation Classification ACL 2023 LFTK: Handcrafted Features in Computational Linguistics ACL 2023 Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability NIPS 2023 Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning NIPS 2023 Fine-Tuning Language Models with Just Forward Passes NIPS 2023 Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms NIPS 2023 Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage NIPS 2023 Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models NIPS 2023 Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks NIPS 2023 Prompt-based Learning for Text Readability Assessment EACL 2023 Provable Hierarchy-Based Meta-Reinforcement Learning AISTATS 2023 Reconstructing Training Data from Model Gradient, Provably AISTATS 2023 Provably Efficient Reinforcement Learning via Surprise Bound AISTATS 2023 Offline Reinforcement Learning with Realizability and Single-policy Concentrability COLT 2022 Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems NIPS 2022 Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials NIPS 2022 From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent NIPS 2022 On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias NIPS 2022 Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent NIPS 2022 Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games AISTATS 2022 Optimization-Based Separations for Neural Networks COLT 2022 Neural Networks can Learn Representations with Gradient Descent COLT 2022 How Fine-Tuning Allows for Effective Meta-Learning NIPS 2021 Going Beyond Linear RL: Sample Efficient Neural Function Approximation NIPS 2021 Label Noise SGD Provably Prefers Flat Global Minimizers NIPS 2021 Optimal Gradient-based Algorithms for Non-concave Bandit Optimization NIPS 2021 Shape Matters: Understanding the Implicit Bias of the Noise Covariance COLT 2021 How Important is the Train-Validation Split in Meta-Learning? ICML 2021 Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks COLT 2021 A Theory of Label Propagation for Subpopulation Shift ICML 2021 Bilinear Classes: A Structural Framework for Provable Generalization in RL ICML 2021 Near-Optimal Linear Regression under Distribution Shift ICML 2021 Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features EMNLP 2021 Predicting What You Already Know Helps: Provable Self-Supervised Learning NIPS 2021 Beyond Lazy Training for Over-parameterized Tensor Decomposition NIPS 2020 Generalized Leverage Score Sampling for Neural Networks NIPS 2020 Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation EMNLP 2020 Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior AAAI 2020 Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters NIPS 2020 Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot NIPS 2020 How to Characterize The Landscape of Overparameterized Convolutional Neural Networks NIPS 2020 LXPER Index 2.0: Improving Text Readability Assessment Model for L2 English Students in Korea AACL 2020 SGD Learns One-Layer Networks in WGANs ICML 2020 Optimal transport mapping via input convex neural networks ICML 2020 Towards Understanding Hierarchical Learning: Benefits of Neural Representations NIPS 2020 Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy NIPS 2020 Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity NIPS 2020 On the Discrepancy between Density Estimation and Sequence Generation EMNLP 2020 Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel NIPS 2019 Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods NIPS 2019 Countering Language Drift via Visual Grounding IJCNLP 2019 Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models ICML 2019 Gradient Descent Finds Global Minima of Deep Neural Networks ICML 2019 Convergence of Gradient Descent on Separable Data AISTATS 2019 Countering Language Drift via Visual Grounding EMNLP 2019 Neural Temporal-Difference Learning Converges to Global Optima NIPS 2019 Convergence of Adversarial Training in Overparametrized Neural Networks NIPS 2019 Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solution for Nonconvex Distributed Optimization Over Networks ICML 2018 Implicit Bias of Gradient Descent on Linear Convolutional Networks NIPS 2018 Provably Correct Automatic Sub-Differentiation for Qualified Programs NIPS 2018 On the Convergence and Robustness of Training GANs with Regularized Optimal Transport NIPS 2018 Adding One Neuron Can Eliminate All Bad Local Minima NIPS 2018 Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced NIPS 2018 Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement EMNLP 2018 Emergent Translation in Multi-Agent Communication ICLR 2018 On the Power of Over-parametrization in Neural Networks with Quadratic Activation ICML 2018 Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima ICML 2018 Characterizing Implicit Bias in Terms of Optimization Geometry ICML 2018 On the Learnability of Fully-Connected Neural Networks AISTATS 2017 Black-box Importance Sampling AISTATS 2017 Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data AISTATS 2017 Gradient Descent Can Take Exponential Time to Escape Saddle Points NIPS 2017 A Kernelized Stein Discrepancy for Goodness-of-fit Tests ICML 2016 Matrix Completion has No Spurious Local Minimum NIPS 2016 Evaluating the statistical significance of biclusters NIPS 2015 Exact Post Model Selection Inference for Marginal Screening NIPS 2014 Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices NIPS 2014 Using multiple samples to learn mixture models NIPS 2013 Structure Learning of Mixed Graphical Models AISTATS 2013 On model selection consistency of penalized M-estimators: a geometric theory NIPS 2013 Proximal Newton-type methods for convex optimization NIPS 2012 Practical Large-Scale Optimization for Max-norm Regularization NIPS 2010