Michael W. Mahoney

102 papers · 2005–2025 · 13 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🗺️ Taxonomy Completionist (29) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (20) 🏠 Conference Loyalist (38) 🌟 Keyword Trendsetter Combo (7) 🤝 Dynamic Duo (20) 👑 Triple Crown 🏆 Keyword Champion (2) 🏆 Grand Slam 🔬 Deep Specialist (25) 🗃️ Keyword Collector (93) 📈 Trend Setter 🔥 Unstoppable (12) ⚡ Prolific Year (12) 💎 Century Club (102) ❓ The Questioner (2)

Conferences

NIPS (38) JMLR (17) ICML (16) ICLR (15) AISTATS (3) CVPR (3) UAI (3) ACL (2) AAAI (1) COLT (1) ICCV (1) IJCAI (1) WACV (1)

Top co-authors

Amir Gholami (20) Kurt Keutzer (20) Liam Hodgkinson (12) N. Benjamin Erichson (11) Zhewei Yao (10) Sehoon Kim (10) Yaoqing Yang (8) Fred Roosta (8) Rajiv Khanna (7) Michal Derezinski (6)

Keywords

model compression (10) low-rank approximation (8) kernel methods (7) dimensionality reduction (6) spectral clustering (5) spectral analysis (5) matrix approximation (5) neural network (5) distributed optimization (5) nystrom method (5) inference efficiency (4) double descent (4) graph laplacian (4) partial differential equation (4) uncertainty quantification (3) importance sampling (3) feature selection (3) communication efficiency (3) model quantization (3) convex optimization (3)

Papers

Models of Heavy-Tailed Mechanistic Universality ICML 2025 Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton ICML 2025 QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache ICML 2025 HOPE for a Robust Parameterization of Long-memory State Space Models ICLR 2025 Tuning Frequency Bias of State Space Models ICLR 2025 Gradient-Free Generation for Hard-Constrained Systems ICLR 2025 Determinant Estimation under Memory Constraints and Neural Scaling Laws ICML 2025 Mitigating Memorization in Language Models ICLR 2025 Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization ICML 2025 Gated Recurrent Neural Networks with Weighted Time-Delay Feedback AISTATS 2025 Squeezed Attention: Accelerating Long Context Length LLM Inference ACL 2025 A Statistical Framework for Ranking LLM-based Chatbots ICLR 2025 Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs ICML 2024 Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs ICLR 2024 Towards Scalable and Versatile Weight Space Learning ICML 2024 Robustifying State-space Models for Long Sequences via Approximate Diagonalization ICLR 2024 An LLM Compiler for Parallel Function Calling ICML 2024 SqueezeLLM: Dense-and-Sparse Quantization ICML 2024 KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization NIPS 2024 Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning NIPS 2024 AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models NIPS 2024 How many classifiers do we need? NIPS 2024 Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance NIPS 2024 Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels AISTATS 2024 A Three-regime Model of Network Pruning ICML 2023 Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training NIPS 2023 Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior NIPS 2023 Gradient Gating for Deep Multi-Rate Learning on Graphs ICLR 2023 Fast Feature Selection with Fairness Constraints AISTATS 2023 Learning differentiable solvers for systems with hard constraints ICLR 2023 Learning Physical Models that Can Respect Conservation Laws ICML 2023 Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes ICML 2023 Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching ICML 2023 When are ensembles really effective? NIPS 2023 A Heavy-Tailed Algebra for Probabilistic Programming NIPS 2023 Speculative Decoding with Big Little Decoder NIPS 2023 Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information ICLR 2022 Squeezeformer: An Efficient Transformer for Automatic Speech Recognition NIPS 2022 A Fast Post-Training Pruning Framework for Transformers NIPS 2022 Noisy Feature Mixup ICLR 2022 Long Expressive Memory for Sequence Modeling ICLR 2022 Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms JMLR 2022 LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data JMLR 2022 Hessian-Aware Pruning and Optimal Neural Implant WACV 2022 Sparse Quantized Spectral Clustering ICLR 2021 Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update NIPS 2021 Noisy Recurrent Neural Networks NIPS 2021 Taxonomizing local versus global structure in neural network loss landscapes NIPS 2021 Hessian Eigenspectra of More Realistic Nonlinear Models NIPS 2021 Stateful ODE-Nets using Basis Function Expansions NIPS 2021 Characterizing possible failure modes in physics-informed neural networks NIPS 2021 Lipschitz Recurrent Neural Networks ICLR 2021 Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification ICLR 2021 I-BERT: Integer-only BERT Quantization ICML 2021 Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract) IJCAI 2021 Statistical guarantees for local graph clustering JMLR 2021 Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning JMLR 2021 Limit theorems for out-of-sample extensions of the adjacency and Laplacian spectral embeddings JMLR 2021 LocalNewton: Reducing communication rounds for distributed learning UAI 2021 Stochastic continuous normalizing flows: training SDEs as ODEs UAI 2021 Geometric rates of convergence for kernel-based sampling algorithms UAI 2021 HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks NIPS 2020 Precise expressions for random projections: Low-rank approximation and randomized Newton NIPS 2020 A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent NIPS 2020 Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization NIPS 2020 Boundary thickness and robustness in learning models NIPS 2020 Exact expressions for double descent and implicit regularization via surrogate random design NIPS 2020 Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method NIPS 2020 A Statistical Framework for Low-bitwidth Training of Deep Neural Networks NIPS 2020 ZeroQ: A Novel Zero Shot Quantization Framework CVPR 2020 Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT AAAI 2020 Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds JMLR 2019 Distributed estimation of the inverse Hessian by determinantal averaging NIPS 2019 ANODEV2: A Coupled Neural ODE Framework NIPS 2019 Trust Region Based Adversarial Attack on Neural Networks CVPR 2019 HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision ICCV 2019 A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication JMLR 2019 Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression COLT 2019 Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning JMLR 2018 Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging JMLR 2018 GIANT: Globally Improved Approximate Newton Method for Distributed Optimization NIPS 2018 Hessian-based Analysis of Large Batch Training and Robustness to Adversaries NIPS 2018 Skip-Gram − Zipf + Uniform = Vector Additivity ACL 2017 Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging ICML 2017 Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction NIPS 2017 Capacity Releasing Diffusion for Speed and Locality ICML 2017 Revisiting the Nyström Method for Improved Large-scale Machine Learning JMLR 2016 Sub-sampled Newton Methods with Non-uniform Sampling NIPS 2016 Feature-distributed sparse regression: a screen-and-clean approach NIPS 2016 Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels JMLR 2016 A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares JMLR 2016 A Statistical Perspective on Algorithmic Leveraging JMLR 2015 Fast Randomized Kernel Ridge Regression with Statistical Guarantees NIPS 2015 Random Laplace Feature Maps for Semigroup Kernels on Histograms CVPR 2014 Semi-Supervised Eigenvectors for Large-Scale Locally-Biased Learning JMLR 2014 Semi-supervised Eigenvectors for Locally-biased Learning NIPS 2012 A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally JMLR 2012 Fast Approximation of Matrix Coherence and Statistical Leverage JMLR 2012 Regularized Laplacian Estimation and Fast Eigenvector Approximation NIPS 2011 CUR from a Sparse Optimization Viewpoint NIPS 2010 Unsupervised Feature Selection for the $k$-means Clustering Problem NIPS 2009 On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning JMLR 2005