Mikhail Belkin
49 papers · 2006–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) πΊοΈ Taxonomy Completionist (26) π£ Hot Topic Early Bird
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(5)
π±
Topic Pioneer
π
Triple Crown
π¬
Deep Specialist
(12)
π
Keyword Champion
π
Conference Pioneer
β‘
Prolific Year
(6)
π₯
Unstoppable
(13)
β
The Questioner
(5)
π
Trend Setter
ποΈ
Keyword Collector
(73)
π
Century Club
(49)
Conferences
NIPS (16)
COLT (8)
ICML (8)
AISTATS (4)
ICLR (4)
JMLR (4)
UAI (2)
ALT (1)
INTERSPEECH (1)
NAACL (1)
Top co-authors
Research topics
Keywords
kernel methods
(10)
kernel machine
(5)
independent component analysis
(5)
semi-supervised learning
(5)
spectral clustering
(4)
gradient descent
(3)
manifold learning
(3)
graph laplacian
(3)
gaussian noise
(3)
spectral analysis
(2)
reproducing kernel hilbert space
(2)
tensor decomposition
(2)
eigenvalue decomposition
(2)
cluster assumption
(2)
feature learning
(2)
point cloud
(2)
signal separation
(2)
blind source separation
(2)
stochastic gradient descent
(2)
dimensionality reduction
(2)
Papers
Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?
ICML 2025
UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models
NAACL 2025
A Gap Between the Gaussian RKHS and Neural Networks: An Infinite-Center Asymptotic Analysis
COLT 2025
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
ICML 2025
Uncertainty Estimation with Recursive Feature Machines
UAI 2024
More is Better: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
ICLR 2024
On the NystrΓΆm Approximation for Preconditioning in Kernel Machines
AISTATS 2024
Quadratic models for understanding catapult dynamics of neural networks
ICLR 2024
Average gradient outer product as a mechanism for deep neural collapse
NIPS 2024
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
ICML 2024
Toward Large Kernel Models
ICML 2023
Neural tangent kernel at initialization: linear width suffices
UAI 2023
Cut your Losses with Squentropy
ICML 2023
Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures
NIPS 2021
Multiple Descent: Design Your Own Generalization Curve
NIPS 2021
EVALUATION OF NEURAL ARCHITECTURES TRAINED WITH SQUARE LOSS VS CROSS-ENTROPY IN CLASSIFICATION TASKS
ICLR 2021
Classification vs regression in overparameterized regimes: Does the loss function matter?
JMLR 2021
Conference on Learning Theory 2021: Post-conference Preface
COLT 2021
Accelerating SGD with momentum for over-parameterized learning
ICLR 2020
Does data interpolation contradict statistical optimality?
AISTATS 2019
Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement
INTERSPEECH 2019
Unperturbed: spectral analysis beyond Davis-Kahan
ALT 2018
Approximation beats concentration? An approximation view on inference with smooth radial kernels
COLT 2018
Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate
NIPS 2018
To Understand Deep Learning We Need to Understand Kernel Learning
ICML 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
ICML 2018
Diving into the shallows: a computational perspective on large-scale shallow learning
NIPS 2017
Clustering with Bregman Divergences: an Asymptotic Analysis
NIPS 2016
Basis Learning as an Algorithmic Primitive
COLT 2016
Back to the Future: Radial Basis Function Networks Revisited
AISTATS 2016
Graphons, mergeons, and so on!
NIPS 2016
Learning privately from multiparty data
ICML 2016
A Pseudo-Euclidean Iteration for Optimal Recovery in Noisy ICA
NIPS 2015
Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering
COLT 2015
Learning with Fredholm Kernels
NIPS 2014
The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures
COLT 2014
Blind Signal Separation in the Presence of Gaussian Noise
COLT 2013
Inverse Density as an Inverse Problem: the Fredholm Equation Approach
NIPS 2013
Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
NIPS 2013
Toward Understanding Complex Spaces: Graph Laplacians on Manifolds with Singularities and Boundaries
COLT 2012
Laplacian Support Vector Machines Trained in the Primal
JMLR 2011
Data Skeletonization via Reeb Graphs
NIPS 2011
Semi-supervised Learning by Higher Order Regularization
AISTATS 2011
On Learning with Integral Operators
JMLR 2010
Semi-supervised Learning using Sparse Eigenfunction Bases
NIPS 2009
The Value of Labeled and Unlabeled Examples when the Model is Imperfect
NIPS 2007
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
JMLR 2006
Convergence of Laplacian Eigenmaps
NIPS 2006
On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts
NIPS 2006