Konstantin Mishchenko
19 papers · 2018–2025 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Conference Polyglot (6) π Interdisciplinary Bridge π£ Hot Topic Early Bird π§ Keyword Pioneer π Academic Marathon (7)
π£
Hot Topic Early Bird
π
Cross-Pollinator
(14)
π
Triple Crown
π
Keyword Champion
(2)
π¬
Deep Specialist
(14)
ποΈ
Keyword Collector
(76)
π
Trend Setter
β‘
Prolific Year
(6)
π
Century Club
(19)
Conferences
ICML (7)
NIPS (6)
AISTATS (3)
EMNLP (1)
ICLR (1)
UAI (1)
Top co-authors
Keywords
stochastic gradient descent
(6)
convex optimization
(5)
gradient descent
(4)
distributed learning
(3)
stochastic optimization
(3)
non-convex optimization
(3)
random reshuffling
(2)
distributed optimization
(2)
nonsmooth optimization
(2)
communication efficiency
(2)
proximal gradient
(2)
variance reduction
(2)
federated learning
(2)
strongly convex
(2)
convergence rate
(2)
deep learning
(1)
nonconvex optimization
(1)
communication complexity
(1)
logistic regression
(1)
sample efficiency
(1)
Papers
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
EMNLP 2025
Adaptive Proximal Gradient Method for Convex Optimization
NIPS 2024
Prodigy: An Expeditiously Adaptive Parameter-Free Learner
ICML 2024
The Road Less Scheduled
NIPS 2024
Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy
ICML 2023
Learning-Rate-Free Learning by D-Adaptation
ICML 2023
DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method
NIPS 2023
IntSGD: Adaptive Floatless Compression of Stochastic Gradients
ICLR 2022
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
ICML 2022
Proximal and Federated Random Reshuffling
ICML 2022
Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays
NIPS 2022
99% of Worker-Master Communication in Distributed Optimization Is Not Needed
UAI 2020
Random Reshuffling: Simple Analysis with Vast Improvements
NIPS 2020
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate
AISTATS 2020
Tighter Theory for Local SGD on Identical and Heterogeneous Data
AISTATS 2020
Revisiting Stochastic Extragradient
AISTATS 2020
Adaptive Gradient Descent without Descent
ICML 2020
A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning
ICML 2018
SEGA: Variance Reduction via Gradient Sketching
NIPS 2018