stochastic gradient descent
1088 papers
Also known as
SGD
ASGD
SAGA
SGM
SGDA
PSGD
SKGD
Co-occurring keywords
Papers
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
NIPS 2023
Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise
NIPS 2023
Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees
JMLR 2023
spred: Solving L1 Penalty with SGD
ICML 2023