Peter Richtarik

120 papers · 2010–2025 · 10 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (18) 🌍 Conference Polyglot (10)

🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (10) 🏃 Academic Marathon (15) 🏠 Conference Loyalist (39) 🏆 Keyword Champion (6) 🤝 Dynamic Duo (18) 🏆 Grand Slam 👑 Triple Crown 🔬 Deep Specialist (22) 📈 Trend Setter ❓ The Questioner (2) ⚡ Prolific Year (16) 🚀 Conference Pioneer 🗃️ Keyword Collector (102) 💎 Century Club (120) 🔥 Unstoppable (8)

Conferences

NIPS (39) ICML (31) AISTATS (15) ICLR (15) JMLR (7) UAI (6) AAAI (3) ALT (2) NAACL (1) NSDI (1)

Top co-authors

Dmitry Kovalev (18) Eduard Gorbunov (16) Samuel Horváth (13) Alexander Tyurin (12) Filip Hanzely (11) Grigory Malinovsky (11) Martin Takac (10) Xun Qian (9) Konstantin Mishchenko (8) Alexander Gasnikov (8)

Keywords

stochastic optimization (25) distributed optimization (23) federated learning (21) stochastic gradient descent (21) variance reduction (19) communication compression (18) distributed learning (14) convex optimization (13) nonconvex optimization (11) communication efficiency (11) communication complexity (9) gradient compression (9) coordinate descent (9) stochastic gradient (8) error feedback (8) decentralized optimization (7) importance sampling (7) gradient descent (7) convergence analysis (7) empirical risk minimization (7)

Papers

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity ICLR 2025 Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization ICLR 2025 Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity ICML 2025 ATA: Adaptive Task Allocation for Efficient Resource Management in Distributed Machine Learning ICML 2025 Correlated Quantization for Faster Nonconvex Distributed Optimization UAI 2025 MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times UAI 2025 ELF: Federated Langevin Algorithms with Primal, Dual and Bidirectional Compression UAI 2025 LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression ICLR 2025 EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback JMLR 2025 HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem NAACL 2025 MAST: model-agnostic sparsified training ICLR 2025 Minibatch Stochastic Three Points Method for Unconstrained Smooth Minimization AAAI 2024 High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise ICML 2024 The Power of Extrapolation in Federated Learning NIPS 2024 Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness Constants ICLR 2024 Towards a Better Theoretical Understanding of Independent Subnetwork Training ICML 2024 On the Optimal Time Complexities in Decentralized Stochastic Asynchronous Optimization NIPS 2024 Understanding Progressive Training Through the Framework of Randomized Coordinate Descent AISTATS 2024 Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates AISTATS 2024 MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence NIPS 2024 Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity NIPS 2024 PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression NIPS 2024 Byzantine Robustness and Partial Participation Can Be Achieved at Once: Just Clip Gradient Differences NIPS 2024 Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations NIPS 2024 Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences NIPS 2024 Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity NIPS 2024 Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization ICLR 2024 FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity ICLR 2024 High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance ICML 2023 Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top ICLR 2023 DASHA: Distributed Nonconvex Optimization with Communication Compression and Optimal Oracle Complexity ICLR 2023 Random Reshuffling with Variance Reduction: New Analysis and Better Rates UAI 2023 2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression NIPS 2023 Optimal Time Complexities of Parallel Stochastic Optimization Methods Under a Fixed Computation Model NIPS 2023 A Guide Through the Zoo of Biased SGD NIPS 2023 A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting NIPS 2023 Momentum Provably Improves Error Feedback! NIPS 2023 RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates ICLR 2023 EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression ICML 2023 Catalyst Acceleration of Error Compensated Methods Leads to Better Communication Complexity AISTATS 2023 Can 5th Generation Local Training Methods Support Client Sampling? Yes! AISTATS 2023 Convergence of Stein Variational Gradient Descent under a Weaker Smoothness Condition AISTATS 2023 On Biased Compression for Distributed Learning JMLR 2023 A Convergence Theory for SVGD in the Population Limit under Talagrand’s Inequality T1 ICML 2022 FedNL: Making Newton-Type Methods Applicable to Federated Learning ICML 2022 3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation ICML 2022 Shifted compression framework: generalizations and improvements UAI 2022 Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling NIPS 2022 EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization NIPS 2022 Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning AISTATS 2022 An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints AISTATS 2022 FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning AISTATS 2022 IntSGD: Adaptive Floatless Compression of Stochastic Gradients ICLR 2022 Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information ICLR 2022 Permutation Compressors for Provably Faster Distributed Nonconvex Optimization ICLR 2022 Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning NIPS 2022 Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees NIPS 2022 ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! ICML 2022 BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression NIPS 2022 Optimal Algorithms for Decentralized Stochastic Variational Inequalities NIPS 2022 A Damped Newton Method Achieves Global $\mathcal O \left(\frac{1}{k^2}\right)$ and Local Quadratic Convergence Rate NIPS 2022 Proximal and Federated Random Reshuffling ICML 2022 Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with an Inexact Prox NIPS 2022 Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques NIPS 2022 Error Compensated Distributed SGD Can Be Accelerated NIPS 2021 EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback NIPS 2021 CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression NIPS 2021 Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks NIPS 2021 Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization NIPS 2021 Hyperparameter Transfer Learning with Adaptive Complexity AISTATS 2021 Local SGD: Unified Theory and New Efficient Methods AISTATS 2021 A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! AISTATS 2021 A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning ICLR 2021 MARINA: Faster Non-Convex Distributed Learning with Compression ICML 2021 Distributed Second Order Methods with Fast Rates and Compressed Communication ICML 2021 ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks ICML 2021 PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization ICML 2021 Stochastic Sign Descent Methods: New Algorithms and Better Theory ICML 2021 L-SVRG and L-Katyusha with Arbitrary Sampling JMLR 2021 Scaling Distributed Machine Learning with In-Network Aggregation NSDI 2021 Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization NIPS 2020 Lower Bounds and Optimal Algorithms for Personalized Federated Learning NIPS 2020 Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm NIPS 2020 Stochastic Subspace Cubic Newton Method ICML 2020 99% of Worker-Master Communication in Distributed Optimization Is Not Needed UAI 2020 Don’t Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop ALT 2020 Revisiting Stochastic Extragradient AISTATS 2020 Tighter Theory for Local SGD on Identical and Heterogeneous Data AISTATS 2020 A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent AISTATS 2020 A Stochastic Derivative-Free Optimization Method with Importance Sampling: Theory and Learning to Control AAAI 2020 A Stochastic Derivative Free Optimization Method with Momentum ICLR 2020 Linearly Converging Error Compensated SGD NIPS 2020 Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems ICML 2020 Random Reshuffling: Simple Analysis with Vast Improvements NIPS 2020 Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization ICML 2020 From Local SGD to Local Fixed-Point Methods for Federated Learning ICML 2020 New Convergence Aspects of Stochastic Gradient Algorithms JMLR 2019 Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates NIPS 2019 SAGA with Arbitrary Sampling ICML 2019 Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches AISTATS 2019 SGD: General Analysis and Improved Rates ICML 2019 Nonconvex Variance Reduced Optimization with Arbitrary Sampling ICML 2019 A Nonconvex Projection Method for Robust PCA AAAI 2019 RSN: Randomized Subspace Newton NIPS 2019 Randomized Block Cubic Newton Method ICML 2018 SGD and Hogwild! Convergence Without the Bounded Gradients Assumption ICML 2018 Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization NIPS 2018 SEGA: Variance Reduction via Gradient Sketching NIPS 2018 Coordinate Descent Faceoff: Primal or Dual? ALT 2018 Importance Sampling for Minibatches JMLR 2018 Stochastic Spectral and Conjugate Descent Methods NIPS 2018 Distributed Coordinate Descent Method for Learning with Big Data JMLR 2016 Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling ICML 2016 SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization ICML 2016 Stochastic Block BFGS: Squeezing More Curvature out of Data ICML 2016 Stochastic Dual Coordinate Ascent with Adaptive Probabilities ICML 2015 Adding vs. Averaging in Distributed Primal-Dual Optimization ICML 2015 Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling NIPS 2015 Mini-Batch Primal and Dual Methods for SVMs ICML 2013 Generalized Power Method for Sparse Principal Component Analysis JMLR 2010