Papers
A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning
Nikunj Saunshi, Arushi Gupta, Wei Hu
A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance
Minhui Huang, Shiqian Ma, Lifeng Lai
ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables
Aleksandar Dimitriev, Mingyuan Zhou
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
Jungmin Kwon, Jeongseop Kim, Hyunseo Park et al.
A Sampling-Based Method for Tensor Ring Decomposition
Osman Asif Malik, Stephen Becker
A Scalable Deterministic Global Optimization Algorithm for Clustering Problems
Kaixun Hua, Mingfei Shi, Yankai Cao
A Scalable Second Order Method for Ill-Conditioned Matrix Completion from Few Samples
Christian Kümmerle, Claudio M. Verdun
A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance
Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
Qinghua Liu, Tiancheng Yu, Yu Bai et al.
A statistical perspective on distillation
Aditya K Menon, Ankit Singh Rawat, Sashank Reddi et al.
A Structured Observation Distribution for Generative Biological Sequence Prediction and Forecasting
Eli N Weinstein, Debora Marks
Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections
Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu et al.
Asymmetric Loss Functions for Learning with Noisy Labels
Xiong Zhou, Xianming Liu, Junjun Jiang et al.
Asymptotic Normality and Confidence Intervals for Prediction Risk of the Min-Norm Least Squares Estimator
Zeng Li, Chuanlong Xie, Qinwen Wang
Asymptotics of Ridge Regression in Convolutional Models
Mojtaba Sahraee-Ardakan, Tung Mai, Anup Rao et al.
Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction
Kenta Niwa, Guoqiang Zhang, W. Bastiaan Kleijn et al.
Asynchronous Distributed Learning : Adapting to Gradient Delays without Prior Knowledge
Rotem Zamir Aviv, Ido Hakimi, Assaf Schuster et al.
A Tale of Two Efficient and Informative Negative Sampling Distributions
Shabnam Daghaghi, Tharun Medini, Nicholas Meisburger et al.
A Theory of Label Propagation for Subpopulation Shift
Tianle Cai, Ruiqi Gao, Jason Lee et al.
Attention is not all you need: pure attention loses rank doubly exponentially with depth
Yihe Dong, Jean-Baptiste Cordonnier, Andreas Loukas
Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment
Philip J Ball, Cong Lu, Jack Parker-Holder et al.
A Unified Generative Adversarial Network Training via Self-Labeling and Self-Attention
Tomoki Watanabe, Paolo Favaro
A Unified Lottery Ticket Hypothesis for Graph Neural Networks
Tianlong Chen, Yongduo Sui, Xuxi Chen et al.
AutoAttend: Automated Attention Representation Search
Chaoyu Guan, Xin Wang, Wenwu Zhu