Co-occurring keywords
Papers
Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training
EMNLP 2019
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
ICML 2019