Mikhail Belkin

49 papers · 2006–2025 · 10 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (26) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (5) 🌱 Topic Pioneer 👑 Triple Crown 🔬 Deep Specialist (12) 🏆 Keyword Champion 🚀 Conference Pioneer ⚡ Prolific Year (6) 🔥 Unstoppable (13) ❓ The Questioner (5) 📈 Trend Setter 🗃️ Keyword Collector (73) 💎 Century Club (49)

Conferences

NIPS (16) COLT (8) ICML (8) AISTATS (4) ICLR (4) JMLR (4) UAI (2) ALT (1) INTERSPEECH (1) NAACL (1)

Top co-authors

Yusu Wang (6) Luis Rademacher (6) Amirhesam Abedsoltan (4) Chaoyue Liu (4) Libin Zhu (4) Qichao Que (4) SIYUAN MA (4) Parthe Pandit (3) Adityanarayanan Radhakrishnan (3) Justin Eldridge (3)

Research topics

Probability (1)

Keywords

kernel methods (10) kernel machine (5) independent component analysis (5) semi-supervised learning (5) spectral clustering (4) gradient descent (3) manifold learning (3) graph laplacian (3) gaussian noise (3) spectral analysis (2) reproducing kernel hilbert space (2) tensor decomposition (2) eigenvalue decomposition (2) cluster assumption (2) feature learning (2) point cloud (2) signal separation (2) blind source separation (2) stochastic gradient descent (2) dimensionality reduction (2)

Papers

Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks? ICML 2025 UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models NAACL 2025 A Gap Between the Gaussian RKHS and Neural Networks: An Infinite-Center Asymptotic Analysis COLT 2025 Emergence in non-neural models: grokking modular arithmetic via average gradient outer product ICML 2025 Uncertainty Estimation with Recursive Feature Machines UAI 2024 More is Better: when Infinite Overparameterization is Optimal and Overfitting is Obligatory ICLR 2024 On the Nyström Approximation for Preconditioning in Kernel Machines AISTATS 2024 Quadratic models for understanding catapult dynamics of neural networks ICLR 2024 Average gradient outer product as a mechanism for deep neural collapse NIPS 2024 Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning ICML 2024 Toward Large Kernel Models ICML 2023 Neural tangent kernel at initialization: linear width suffices UAI 2023 Cut your Losses with Squentropy ICML 2023 Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures NIPS 2021 Multiple Descent: Design Your Own Generalization Curve NIPS 2021 EVALUATION OF NEURAL ARCHITECTURES TRAINED WITH SQUARE LOSS VS CROSS-ENTROPY IN CLASSIFICATION TASKS ICLR 2021 Classification vs regression in overparameterized regimes: Does the loss function matter? JMLR 2021 Conference on Learning Theory 2021: Post-conference Preface COLT 2021 Accelerating SGD with momentum for over-parameterized learning ICLR 2020 Does data interpolation contradict statistical optimality? AISTATS 2019 Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement INTERSPEECH 2019 Unperturbed: spectral analysis beyond Davis-Kahan ALT 2018 Approximation beats concentration? An approximation view on inference with smooth radial kernels COLT 2018 Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate NIPS 2018 To Understand Deep Learning We Need to Understand Kernel Learning ICML 2018 The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning ICML 2018 Diving into the shallows: a computational perspective on large-scale shallow learning NIPS 2017 Clustering with Bregman Divergences: an Asymptotic Analysis NIPS 2016 Basis Learning as an Algorithmic Primitive COLT 2016 Back to the Future: Radial Basis Function Networks Revisited AISTATS 2016 Graphons, mergeons, and so on! NIPS 2016 Learning privately from multiparty data ICML 2016 A Pseudo-Euclidean Iteration for Optimal Recovery in Noisy ICA NIPS 2015 Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering COLT 2015 Learning with Fredholm Kernels NIPS 2014 The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures COLT 2014 Blind Signal Separation in the Presence of Gaussian Noise COLT 2013 Inverse Density as an Inverse Problem: the Fredholm Equation Approach NIPS 2013 Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis NIPS 2013 Toward Understanding Complex Spaces: Graph Laplacians on Manifolds with Singularities and Boundaries COLT 2012 Laplacian Support Vector Machines Trained in the Primal JMLR 2011 Data Skeletonization via Reeb Graphs NIPS 2011 Semi-supervised Learning by Higher Order Regularization AISTATS 2011 On Learning with Integral Operators JMLR 2010 Semi-supervised Learning using Sparse Eigenfunction Bases NIPS 2009 The Value of Labeled and Unlabeled Examples when the Model is Imperfect NIPS 2007 Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples JMLR 2006 Convergence of Laplacian Eigenmaps NIPS 2006 On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts NIPS 2006