Konstantin Mishchenko

19 papers · 2018–2025 · 6 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🌍 Conference Polyglot (6) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🏃 Academic Marathon (7)

🐣 Hot Topic Early Bird 🐝 Cross-Pollinator (14) 👑 Triple Crown 🏆 Keyword Champion (2) 🔬 Deep Specialist (14) 🗃️ Keyword Collector (76) 📈 Trend Setter ⚡ Prolific Year (6) 💎 Century Club (19)

Conferences

ICML (7) NIPS (6) AISTATS (3) EMNLP (1) ICLR (1) UAI (1)

Top co-authors

Peter Richtarik (8) Ahmed Khaled (5) Yura Malitsky (3) Aaron Defazio (3) Dmitry Kovalev (2) Filip Hanzely (2) Jérôme Malick (1) Hao Mark Chen (1) Blake Woodworth (1) Chi Jin (1)

Keywords

stochastic gradient descent (6) convex optimization (5) gradient descent (4) distributed learning (3) stochastic optimization (3) non-convex optimization (3) random reshuffling (2) distributed optimization (2) nonsmooth optimization (2) communication efficiency (2) proximal gradient (2) variance reduction (2) federated learning (2) strongly convex (2) convergence rate (2) deep learning (1) nonconvex optimization (1) communication complexity (1) logistic regression (1) sample efficiency (1)

Papers

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference EMNLP 2025 Adaptive Proximal Gradient Method for Convex Optimization NIPS 2024 Prodigy: An Expeditiously Adaptive Parameter-Free Learner ICML 2024 The Road Less Scheduled NIPS 2024 Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy ICML 2023 Learning-Rate-Free Learning by D-Adaptation ICML 2023 DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method NIPS 2023 IntSGD: Adaptive Floatless Compression of Stochastic Gradients ICLR 2022 ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! ICML 2022 Proximal and Federated Random Reshuffling ICML 2022 Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays NIPS 2022 99% of Worker-Master Communication in Distributed Optimization Is Not Needed UAI 2020 Random Reshuffling: Simple Analysis with Vast Improvements NIPS 2020 DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate AISTATS 2020 Tighter Theory for Local SGD on Identical and Heterogeneous Data AISTATS 2020 Revisiting Stochastic Extragradient AISTATS 2020 Adaptive Gradient Descent without Descent ICML 2020 A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning ICML 2018 SEGA: Variance Reduction via Gradient Sketching NIPS 2018