← Optimization & Theory

Deep Learning › Optimization & Theory ›

Optimization

1638 directly classified papers

Papers per year

Papers

Global Convergence in Training Large-Scale Transformers NIPS 2024

Pre-trained Large Language Models Use Fourier Features to Compute Addition NIPS 2024

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models NIPS 2024

Robust and Faster Zeroth-Order Minimax Optimization: Complexity and Applications NIPS 2024

Globally Q-linear Gauss-Newton Method for Overparameterized Non-convex Matrix Sensing NIPS 2024

SOI: Scaling Down Computational Complexity by Estimating Partial States of the Model NIPS 2024

Evaluating the design space of diffusion-based generative models NIPS 2024

ESPACE: Dimensionality Reduction of Activations for Model Compression NIPS 2024

ST$_k$: A Scalable Module for Solving Top-k Problems NIPS 2024

Understanding Progressive Training Through the Framework of Randomized Coordinate Descent AISTATS 2024

The Road Less Scheduled NIPS 2024

Rethinking Fourier Transform from A Basis Functions Perspective for Long-term Time Series Forecasting NIPS 2024

In-Context Learning State Vector with Inner and Momentum Optimization NIPS 2024

Sketching for Distributed Deep Learning: A Sharper Analysis NIPS 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation OSDI 2024

GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models AISTATS 2024

Towards Scalable and Stable Parallelization of Nonlinear RNNs NIPS 2024

Hardness of Learning Neural Networks under the Manifold Hypothesis NIPS 2024

Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together EMNLP 2024

Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms NIPS 2024

Weight decay induces low-rank attention layers NIPS 2024

MOSEL: Inference Serving Using Dynamic Modality Selection EMNLP 2024

Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization NIPS 2024

Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization NIPS 2024

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution NIPS 2024