Daniel Soudry
56 papers · 2012–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (15) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (7) π Conference Polyglot (8)
π£
Hot Topic Early Bird
π
Renaissance Researcher
(7)
π
Interdisciplinary Bridge
π
Conference Loyalist
(20)
π
Keyword Trendsetter Combo
(3)
π€
Dynamic Duo
(15)
π
Triple Crown
π
Keyword Champion
(2)
π
Grand Slam
π¬
Deep Specialist
(15)
π
Trend Setter
π₯
Unstoppable
(10)
π
Conference Pioneer
β‘
Prolific Year
(10)
β
The Questioner
(5)
ποΈ
Keyword Collector
(181)
π
Century Club
(54)
Conferences
NIPS (20)
ICLR (12)
ICML (10)
AISTATS (3)
COLT (3)
CVPR (3)
ALT (2)
JMLR (2)
AAAI (1)
Top co-authors
Research topics
Keywords
gradient descent
(11)
implicit bia
(9)
neural network
(7)
model compression
(6)
stochastic gradient descent
(5)
separable datum
(5)
neural network quantization
(4)
continual learning
(4)
batch normalization
(4)
learning theory
(3)
gradient flow
(3)
convolutional neural network
(3)
mirror descent
(2)
signal propagation
(2)
linear regression
(2)
logistic loss
(2)
catastrophic forgetting
(2)
representation learning
(2)
neural network optimization
(2)
post-training quantization
(2)
Papers
Optimal L2 Regularization in High-dimensional Continual Linear Regression
ALT 2026
From Continual Learning to SGD and Back: Better Rates for Continual Linear Models
ALT 2026
When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
ICML 2025
Scaling FP8 training to trillion-token LLMs
ICLR 2025
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
NIPS 2024
Provable Tempered Overfitting of Minimal Nets and Typical Nets
NIPS 2024
The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting β An Analytical Model
ICLR 2024
Exponential Quantum Communication Advantage in Distributed Inference and Learning
NIPS 2024
The Implicit Bias of Gradient Descent on Separable Multiclass Data
NIPS 2024
How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
ICML 2024
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
ICLR 2024
Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
ICLR 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
ICML 2023
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks
ICLR 2023
Continual Learning in Linear Classification on Separable Data
ICML 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
NIPS 2023
How do Minimum-Norm Shallow Denoisers Look in Function Space?
NIPS 2023
Explore to Generalize in Zero-Shot RL
NIPS 2023
The Role of Codeword-to-Class Assignments in Error-Correcting Codes: An Empirical Study
AISTATS 2023
Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations
CVPR 2023
Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
ICLR 2023
Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability
AAAI 2022
Implicit Bias of the Step Size in Linear Diagonal Neural Networks
ICML 2022
How catastrophic can catastrophic forgetting be in linear regression?
COLT 2022
A Statistical Framework for Efficient Out of Distribution Detection in Deep Neural Networks
ICLR 2022
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
NIPS 2021
The Implicit Bias of Minima Stability: A View from Function Space
NIPS 2021
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
ICML 2021
Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling
NIPS 2021
Neural gradients are near-lognormal: improved quantized and sparse training
ICLR 2021
Accurate Post Training Quantization With Small Calibration Sets
ICML 2021
The Knowledge Within: Methods for Data-Free Model Compression
CVPR 2020
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
NIPS 2020
Kernel and Rich Regimes in Overparametrized Models
COLT 2020
Augment Your Batch: Improving Generalization Through Instance Repetition
CVPR 2020
A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
ICLR 2020
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
ICLR 2020
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?
ICML 2020
Post training 4-bit quantization of convolutional networks for rapid-deployment
NIPS 2019
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
AISTATS 2019
How do infinite width bounded norm networks look in function space?
COLT 2019
Convergence of Gradient Descent on Separable Data
AISTATS 2019
A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off
NIPS 2019
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models
ICML 2019
Fix your classifier: the marginal value of training the last weight layer
ICLR 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
ICML 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
NIPS 2018
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
JMLR 2018
The Implicit Bias of Gradient Descent on Separable Data
JMLR 2018
The Implicit Bias of Gradient Descent on Separable Data
ICLR 2018
Scalable methods for 8-bit training of neural networks
NIPS 2018
Norm matters: efficient and accurate normalization schemes in deep networks
NIPS 2018
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
NIPS 2017
Binarized Neural Networks
NIPS 2016
Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights
NIPS 2014
Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter
NIPS 2012