conftrace_

← Optimization & Theory

Machine Learning › Optimization & Theory ›

Neural Network Optimization

3,648 papers

Papers per year

Papers

BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization NIPS 2023

Minimum norm interpolation by perceptra: Explicit regularization and implicit bias NIPS 2023

Linear Convergence of Gradient Descent For Finite Width Over-parametrized Linear Networks With General Initialization AISTATS 2023

Modeling Stroke Mask for End-to-End Text Erasing WACV 2023

A Guide Through the Zoo of Biased SGD NIPS 2023

Are Straight-Through Gradients and Soft-Thresholding All You Need for Sparse Training? WACV 2023

Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima NIPS 2023

Strong Lottery Ticket Hypothesis with $\varepsilon$–perturbation AISTATS 2023

Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single ICML 2023

FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits CVPR 2023

Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures NIPS 2023

A Spectral Viewpoint on Continual Relation Extraction EMNLP 2023

SOL: Sampling-based Optimal Linear bounding of arbitrary scalar functions NIPS 2023

The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing ICML 2023

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models NIPS 2023

Searching Efficient Neural Architecture With Multi-Resolution Fusion Transformer for Appearance-Based Gaze Estimation WACV 2023

The Geometry of Neural Nets' Parameter Spaces Under Reparametrization NIPS 2023

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond ICML 2023

Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network Topologies CVPR 2023

HOTNAS: Hierarchical Optimal Transport for Neural Architecture Search CVPR 2023

Generalized Polyak Step Size for First Order Optimization with Momentum ICML 2023

Neural networks trained with SGD learn distributions of increasing complexity ICML 2023

Lon-eå at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction SEMEVAL 2023

Challenging the “One Single Vector per Token” Assumption CONLL 2023

Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals NIPS 2023