conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Optimization & Theory
Machine Learning
›
Optimization & Theory
›
Neural Network Optimization
3,648 papers
Papers per year
2001: 1
2003: 1
2005: 2
2006: 3
2007: 6
2008: 1
2009: 7
2010: 5
2011: 7
2012: 9
2013: 17
2014: 18
2015: 40
2016: 76
2017: 113
2018: 214
2019: 324
2020: 414
2021: 489
2022: 445
2023: 524
2024: 469
2025: 386
2026: 77
Papers
How Many Layers and Why? An Analysis of the Model Depth in Transformers
IJCNLP 2021
Improved, Deterministic Smoothing for L_1 Certified Robustness
ICML 2021
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
ICML 2021
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
EMNLP 2021
Relative Flatness and Generalization
NIPS 2021
CLUZH at SIGMORPHON 2021 Shared Task on Multilingual Grapheme-to-Phoneme Conversion: Variations on a Baseline
ACL 2021
On the Adequacy of Untuned Warmup for Adaptive Optimization
AAAI 2021
On Linear Stability of SGD and Input-Smoothness of Neural Networks
NIPS 2021
Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality
NAACL 2021
Effective Sparsification of Neural Networks With Global Sparsity Constraint
CVPR 2021
Gradient Methods Never Overfit On Separable Data
JMLR 2021
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix
AISTATS 2021
Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives
JMLR 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
ACL 2021
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
NIPS 2021
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
AISTATS 2021
Prioritized Architecture Sampling With Monto-Carlo Tree Search
CVPR 2021
Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values
INTERSPEECH 2021
On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay
NIPS 2021
Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory
NIPS 2021
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
NIPS 2021
Catformer: Designing Stable Transformers via Sensitivity Analysis
ICML 2021
Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons
NIPS 2021
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
AAAI 2021
Benign Overfitting of Constant-Stepsize SGD for Linear Regression
COLT 2021
<
1
…
78
79
80
…
146
>