Michael W. Mahoney
102 papers · 2005–2025 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (29) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Academic Marathon
(20)
π
Conference Loyalist
(38)
π
Keyword Trendsetter Combo
(7)
π€
Dynamic Duo
(20)
π
Triple Crown
π
Keyword Champion
(2)
π
Grand Slam
π¬
Deep Specialist
(25)
ποΈ
Keyword Collector
(93)
π
Trend Setter
π₯
Unstoppable
(12)
β‘
Prolific Year
(12)
π
Century Club
(102)
β
The Questioner
(2)
Conferences
NIPS (38)
JMLR (17)
ICML (16)
ICLR (15)
AISTATS (3)
CVPR (3)
UAI (3)
ACL (2)
AAAI (1)
COLT (1)
ICCV (1)
IJCAI (1)
WACV (1)
Top co-authors
Keywords
model compression
(10)
low-rank approximation
(8)
kernel methods
(7)
dimensionality reduction
(6)
spectral clustering
(5)
spectral analysis
(5)
matrix approximation
(5)
neural network
(5)
distributed optimization
(5)
nystrom method
(5)
inference efficiency
(4)
double descent
(4)
graph laplacian
(4)
partial differential equation
(4)
uncertainty quantification
(3)
importance sampling
(3)
feature selection
(3)
communication efficiency
(3)
model quantization
(3)
convex optimization
(3)
Papers
Models of Heavy-Tailed Mechanistic Universality
ICML 2025
Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton
ICML 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
ICML 2025
HOPE for a Robust Parameterization of Long-memory State Space Models
ICLR 2025
Tuning Frequency Bias of State Space Models
ICLR 2025
Gradient-Free Generation for Hard-Constrained Systems
ICLR 2025
Determinant Estimation under Memory Constraints and Neural Scaling Laws
ICML 2025
Mitigating Memorization in Language Models
ICLR 2025
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
ICML 2025
Gated Recurrent Neural Networks with Weighted Time-Delay Feedback
AISTATS 2025
Squeezed Attention: Accelerating Long Context Length LLM Inference
ACL 2025
A Statistical Framework for Ranking LLM-based Chatbots
ICLR 2025
Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs
ICML 2024
Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs
ICLR 2024
Towards Scalable and Versatile Weight Space Learning
ICML 2024
Robustifying State-space Models for Long Sequences via Approximate Diagonalization
ICLR 2024
An LLM Compiler for Parallel Function Calling
ICML 2024
SqueezeLLM: Dense-and-Sparse Quantization
ICML 2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
NIPS 2024
Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning
NIPS 2024
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
NIPS 2024
How many classifiers do we need?
NIPS 2024
Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance
NIPS 2024
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels
AISTATS 2024
A Three-regime Model of Network Pruning
ICML 2023
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
NIPS 2023
Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
NIPS 2023
Gradient Gating for Deep Multi-Rate Learning on Graphs
ICLR 2023
Fast Feature Selection with Fairness Constraints
AISTATS 2023
Learning differentiable solvers for systems with hard constraints
ICLR 2023
Learning Physical Models that Can Respect Conservation Laws
ICML 2023
Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes
ICML 2023
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching
ICML 2023
When are ensembles really effective?
NIPS 2023
A Heavy-Tailed Algebra for Probabilistic Programming
NIPS 2023
Speculative Decoding with Big Little Decoder
NIPS 2023
Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information
ICLR 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
NIPS 2022
A Fast Post-Training Pruning Framework for Transformers
NIPS 2022
Noisy Feature Mixup
ICLR 2022
Long Expressive Memory for Sequence Modeling
ICLR 2022
Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms
JMLR 2022
LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data
JMLR 2022
Hessian-Aware Pruning and Optimal Neural Implant
WACV 2022
Sparse Quantized Spectral Clustering
ICLR 2021
Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update
NIPS 2021
Noisy Recurrent Neural Networks
NIPS 2021
Taxonomizing local versus global structure in neural network loss landscapes
NIPS 2021
Hessian Eigenspectra of More Realistic Nonlinear Models
NIPS 2021
Stateful ODE-Nets using Basis Function Expansions
NIPS 2021
Characterizing possible failure modes in physics-informed neural networks
NIPS 2021
Lipschitz Recurrent Neural Networks
ICLR 2021
Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification
ICLR 2021
I-BERT: Integer-only BERT Quantization
ICML 2021
Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)
IJCAI 2021
Statistical guarantees for local graph clustering
JMLR 2021
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
JMLR 2021
Limit theorems for out-of-sample extensions of the adjacency and Laplacian spectral embeddings
JMLR 2021
LocalNewton: Reducing communication rounds for distributed learning
UAI 2021
Stochastic continuous normalizing flows: training SDEs as ODEs
UAI 2021
Geometric rates of convergence for kernel-based sampling algorithms
UAI 2021
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
NIPS 2020
Precise expressions for random projections: Low-rank approximation and randomized Newton
NIPS 2020
A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent
NIPS 2020
Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization
NIPS 2020
Boundary thickness and robustness in learning models
NIPS 2020
Exact expressions for double descent and implicit regularization via surrogate random design
NIPS 2020
Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method
NIPS 2020
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
NIPS 2020
ZeroQ: A Novel Zero Shot Quantization Framework
CVPR 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
AAAI 2020
Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds
JMLR 2019
Distributed estimation of the inverse Hessian by determinantal averaging
NIPS 2019
ANODEV2: A Coupled Neural ODE Framework
NIPS 2019
Trust Region Based Adversarial Attack on Neural Networks
CVPR 2019
HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision
ICCV 2019
A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication
JMLR 2019
Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression
COLT 2019
Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning
JMLR 2018
Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging
JMLR 2018
GIANT: Globally Improved Approximate Newton Method for Distributed Optimization
NIPS 2018
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
NIPS 2018
Skip-Gram β Zipf + Uniform = Vector Additivity
ACL 2017
Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging
ICML 2017
Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction
NIPS 2017
Capacity Releasing Diffusion for Speed and Locality
ICML 2017
Revisiting the NystrΓΆm Method for Improved Large-scale Machine Learning
JMLR 2016
Sub-sampled Newton Methods with Non-uniform Sampling
NIPS 2016
Feature-distributed sparse regression: a screen-and-clean approach
NIPS 2016
Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels
JMLR 2016
A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares
JMLR 2016
A Statistical Perspective on Algorithmic Leveraging
JMLR 2015
Fast Randomized Kernel Ridge Regression with Statistical Guarantees
NIPS 2015
Random Laplace Feature Maps for Semigroup Kernels on Histograms
CVPR 2014
Semi-Supervised Eigenvectors for Large-Scale Locally-Biased Learning
JMLR 2014
Semi-supervised Eigenvectors for Locally-biased Learning
NIPS 2012
A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally
JMLR 2012
Fast Approximation of Matrix Coherence and Statistical Leverage
JMLR 2012
Regularized Laplacian Estimation and Fast Eigenvector Approximation
NIPS 2011
CUR from a Sparse Optimization Viewpoint
NIPS 2010
Unsupervised Feature Selection for the $k$-means Clustering Problem
NIPS 2009
On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning
JMLR 2005