conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Optimization & Theory
Machine Learning
›
Optimization & Theory
›
Neural Network Optimization
3,648 papers
Papers per year
2001: 1
2003: 1
2005: 2
2006: 3
2007: 6
2008: 1
2009: 7
2010: 5
2011: 7
2012: 9
2013: 17
2014: 18
2015: 40
2016: 76
2017: 113
2018: 214
2019: 324
2020: 414
2021: 489
2022: 445
2023: 524
2024: 469
2025: 386
2026: 77
Papers
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
NIPS 2023
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix
NIPS 2023
Principled Weight Initialisation for Input-Convex Neural Networks
NIPS 2023
Sharpness-Aware Minimization Leads to Low-Rank Features
NIPS 2023
Mechanic: A Learning Rate Tuner
NIPS 2023
Pareto Frontiers in Deep Feature Learning: Data, Compute, Width, and Luck
NIPS 2023
Max-Margin Token Selection in Attention Mechanism
NIPS 2023
Generalization bounds for neural ordinary differential equations and deep residual networks
NIPS 2023
Symbolic Discovery of Optimization Algorithms
NIPS 2023
Stable Nonconvex-Nonconcave Training via Linear Interpolation
NIPS 2023
BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization
NIPS 2023
HyP-NeRF: Learning Improved NeRF Priors using a HyperNetwork
NIPS 2023
Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
NIPS 2023
Convergence of Adam Under Relaxed Assumptions
NIPS 2023
On the Convergence of Encoder-only Shallow Transformers
NIPS 2023
Scattering Vision Transformer: Spectral Mixing Matters
NIPS 2023
Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time
NIPS 2023
Optimistic Meta-Gradients
NIPS 2023
Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network
NIPS 2023
Efficient Hyper-parameter Optimization with Cubic Regularization
NIPS 2023
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks
NIPS 2023
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
NIPS 2023
How a Student becomes a Teacher: learning and forgetting through Spectral methods
NIPS 2023
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems
NIPS 2023
ResMem: Learn what you can and memorize the rest
NIPS 2023
<
1
…
41
42
43
…
146
>