Daniel Soudry

56 papers · 2012–2026 · 9 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🗺️ Taxonomy Completionist (15) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (8)

🐣 Hot Topic Early Bird 🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (20) 🌟 Keyword Trendsetter Combo (3) 🤝 Dynamic Duo (15) 👑 Triple Crown 🏆 Keyword Champion (2) 🏆 Grand Slam 🔬 Deep Specialist (15) 📈 Trend Setter 🔥 Unstoppable (10) 🚀 Conference Pioneer ⚡ Prolific Year (10) ❓ The Questioner (5) 🗃️ Keyword Collector (181) 💎 Century Club (54)

Conferences

NIPS (20) ICLR (12) ICML (10) AISTATS (3) COLT (3) CVPR (3) ALT (2) JMLR (2) AAAI (1)

Top co-authors

Nathan Srebro (16) Itay Hubara (12) Elad Hoffer (12) Mor Shpigel Nacson (11) Ron Banner (10) Itay Evron (8) Suriya Gunasekar (7) Edward Moroshko (6) Brian Chmiel (5) Jason Lee (5)

Research topics

Science (1)

Keywords

gradient descent (11) implicit bia (9) neural network (7) model compression (6) stochastic gradient descent (5) separable datum (5) neural network quantization (4) continual learning (4) batch normalization (4) learning theory (3) gradient flow (3) convolutional neural network (3) mirror descent (2) signal propagation (2) linear regression (2) logistic loss (2) catastrophic forgetting (2) representation learning (2) neural network optimization (2) post-training quantization (2)

Papers

Optimal L2 Regularization in High-dimensional Continual Linear Regression ALT 2026 From Continual Learning to SGD and Back: Better Rates for Continual Linear Models ALT 2026 When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets ICML 2025 Scaling FP8 training to trillion-token LLMs ICLR 2025 Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes NIPS 2024 Provable Tempered Overfitting of Minimal Nets and Typical Nets NIPS 2024 The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting — An Analytical Model ICLR 2024 Exponential Quantum Communication Advantage in Distributed Inference and Learning NIPS 2024 The Implicit Bias of Gradient Descent on Separable Multiclass Data NIPS 2024 How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers ICML 2024 Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators ICLR 2024 Minimum Variance Unbiased N:M Sparsity for the Neural Gradients ICLR 2023 Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond ICML 2023 The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks ICLR 2023 Continual Learning in Linear Classification on Separable Data ICML 2023 DropCompute: simple and more robust distributed synchronous training via compute variance reduction NIPS 2023 How do Minimum-Norm Shallow Denoisers Look in Function Space? NIPS 2023 Explore to Generalize in Zero-Shot RL NIPS 2023 The Role of Codeword-to-Class Assignments in Error-Correcting Codes: An Empirical Study AISTATS 2023 Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations CVPR 2023 Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats ICLR 2023 Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability AAAI 2022 Implicit Bias of the Step Size in Linear Diagonal Neural Networks ICML 2022 How catastrophic can catastrophic forgetting be in linear regression? COLT 2022 A Statistical Framework for Efficient Out of Distribution Detection in Deep Neural Networks ICLR 2022 Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks NIPS 2021 The Implicit Bias of Minima Stability: A View from Function Space NIPS 2021 On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent ICML 2021 Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling NIPS 2021 Neural gradients are near-lognormal: improved quantized and sparse training ICLR 2021 Accurate Post Training Quantization With Small Calibration Sets ICML 2021 The Knowledge Within: Methods for Data-Free Model Compression CVPR 2020 Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy NIPS 2020 Kernel and Rich Regimes in Overparametrized Models COLT 2020 Augment Your Batch: Improving Generalization Through Instance Repetition CVPR 2020 A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case ICLR 2020 At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks? ICLR 2020 Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization? ICML 2020 Post training 4-bit quantization of convolutional networks for rapid-deployment NIPS 2019 Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate AISTATS 2019 How do infinite width bounded norm networks look in function space? COLT 2019 Convergence of Gradient Descent on Separable Data AISTATS 2019 A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off NIPS 2019 Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models ICML 2019 Fix your classifier: the marginal value of training the last weight layer ICLR 2018 Characterizing Implicit Bias in Terms of Optimization Geometry ICML 2018 Implicit Bias of Gradient Descent on Linear Convolutional Networks NIPS 2018 Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations JMLR 2018 The Implicit Bias of Gradient Descent on Separable Data JMLR 2018 The Implicit Bias of Gradient Descent on Separable Data ICLR 2018 Scalable methods for 8-bit training of neural networks NIPS 2018 Norm matters: efficient and accurate normalization schemes in deep networks NIPS 2018 Train longer, generalize better: closing the generalization gap in large batch training of neural networks NIPS 2017 Binarized Neural Networks NIPS 2016 Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights NIPS 2014 Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter NIPS 2012