Taiji Suzuki
112 papers · 2010–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (32) π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge π£ Hot Topic Early Bird
π§
Keyword Pioneer
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π
Keyword Trendsetter Combo
(5)
π
Conference Loyalist
(28)
πΊ
Lone Wolf
(8)
π€
Dynamic Duo
(25)
π
Triple Crown
π
Grand Slam
π¬
Deep Specialist
(22)
π
Keyword Champion
(2)
π±
Topic Pioneer
β
The Questioner
(2)
π₯
Unstoppable
(16)
π
Conference Pioneer
π
Century Club
(111)
π
Trend Setter
β‘
Prolific Year
(12)
ποΈ
Keyword Collector
(110)
Conferences
ICML (32)
NIPS (28)
ICLR (25)
AISTATS (16)
COLT (4)
ACML (2)
IJCAI (2)
JMLR (2)
AAAI (1)
Top co-authors
Research topics
Keywords
neural network
(10)
gradient descent
(9)
reproducing kernel hilbert space
(6)
stochastic gradient descent
(6)
generalization bound
(6)
kernel methods
(6)
convex optimization
(5)
neural network optimization
(5)
representation learning
(5)
density ratio estimation
(4)
sample complexity
(4)
tensor decomposition
(4)
gradient boosting
(4)
binary classification
(4)
variance reduction
(4)
global convergence
(3)
convergence analysis
(3)
stochastic gradient
(3)
in-context learning
(3)
feature learning
(3)
Papers
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
AAAI 2026
Nonlinear transformers can perform inference-time feature learning
ICML 2025
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
ICML 2025
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
ICLR 2025
Clustered Invariant Risk Minimization
AISTATS 2025
Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons
AISTATS 2025
Provable In-Context Vector Arithmetic via Retrieving Task Concepts
ICML 2025
Direct Distributional Optimization for Provable Alignment of Diffusion Models
ICLR 2025
Flow matching achieves almost minimax optimal convergence
ICLR 2025
On the Role of Label Noise in the Feature Learning Process
ICML 2025
Quantifying Memory Utilization with Effective State-Size
ICML 2025
Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
ICML 2025
Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
ICLR 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
ICML 2025
Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning
ICML 2025
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
ICLR 2025
Transformers Provably Solve Parity Efficiently with Chain of Thought
ICLR 2025
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
ICLR 2025
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems
ICLR 2024
Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data
ICLR 2024
Minimax optimality of convolutional neural networks for infinite dimensional input-output problems and separation from kernel methods
ICLR 2024
Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime
ICLR 2024
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
NIPS 2024
Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning
NIPS 2024
Pretrained Transformer Efficiently Learns Low-Dimensional Target Functions In-Context
NIPS 2024
On the Comparison between Multi-modal and Single-modal Contrastive Learning
NIPS 2024
Transformers are Minimax Optimal Nonparametric In-Context Learners
NIPS 2024
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
NIPS 2024
Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective
ICML 2024
Koopman-based generalization bound: New aspect for full-rank weights
ICLR 2024
How do Transformers Perform In-Context Autoregressive Learning ?
ICML 2024
Mechanistic Design and Scaling of Hybrid Architectures
ICML 2024
Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning
ICML 2024
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations
COLT 2024
State-Free Inference of State-Space Models: The *Transfer Function* Approach
ICML 2024
SILVER: Single-loop variance reduction and application to federated learning
ICML 2024
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
ICML 2024
High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization
ICML 2024
Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples
ICML 2024
Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory
ICLR 2024
Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics
ICLR 2023
Convergence of mean-field Langevin dynamics: time-space discretization, stochastic gradient, and variance reduction
NIPS 2023
Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective
NIPS 2023
Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond
NIPS 2023
Gradient-Based Feature Learning under Structured Data
NIPS 2023
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
ICML 2023
Tight and fast generalization error bound of graph embedding in metric space
ICML 2023
Diffusion Models are Minimax Optimal Distribution Estimators
ICML 2023
Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems
ICML 2023
DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning
ICML 2023
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods
ICLR 2023
Convex Analysis of the Mean Field Langevin Dynamics
AISTATS 2022
Layer-wise Adaptive Graph Convolution Networks Using
Generalized Pagerank
ACML 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
NIPS 2022
Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime
NIPS 2022
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization
NIPS 2022
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning
NIPS 2022
Understanding the Variance Collapse of SVGD in High Dimensions
ICLR 2022
Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization
ICLR 2022
Learnability of convolutional neural networks for infinite dimensional input via mixed and anisotropic smoothness
ICLR 2022
Dimension-free convergence rates for gradient Langevin dynamics in RKHS
COLT 2022
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
ICLR 2021
Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space
NIPS 2021
Differentiable Multiple Shooting Layers
NIPS 2021
Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis
NIPS 2021
Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features
AISTATS 2021
Gradient Descent in RKHS with Importance Labeling
AISTATS 2021
When does preconditioning help or hurt generalization?
ICLR 2021
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime
ICLR 2021
On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting
ICML 2021
Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning
ICML 2021
Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding
ICML 2021
Decomposable-Net: Scalable Low-Rank Compression for Neural Networks
IJCAI 2021
Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint
ICLR 2020
Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees
AISTATS 2020
Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network
ICLR 2020
Graph Neural Networks Exponentially Lose Expressive Power for Node Classification
ICLR 2020
Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error
IJCAI 2020
Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks
NIPS 2020
Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics
NIPS 2020
Understanding Generalization in Deep Learning via Tensor Methods
AISTATS 2020
Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality
ICLR 2019
Approximation and non-parametric estimation of ResNet-type convolutional neural networks
ICML 2019
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
AISTATS 2019
Asian Conference on Machine Learning: Preface
ACML 2019
Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models
AISTATS 2018
Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables
AISTATS 2018
Functional Gradient Boosting based on Residual Network Perception
ICML 2018
Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation
NIPS 2018
Fast generalization error bound of deep learning from a kernel perspective
AISTATS 2018
Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines
AISTATS 2017
Trimmed Density Ratio Estimation
NIPS 2017
Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization
NIPS 2017
Gaussian process nonparametric tensor estimator and its minimax optimality
ICML 2016
Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning
NIPS 2016
Structure Learning of Partitioned Markov Networks
ICML 2016
A Consistent Method for Graph Based Anomaly Localization
AISTATS 2015
Convergence rate of Bayesian tensor estimator and its minimax optimality
ICML 2015
Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers
ICML 2014
Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
JMLR 2013
Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method
ICML 2013
Convex Tensor Decomposition via Structured Schatten Norm Regularization
NIPS 2013
A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems
COLT 2012
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model
COLT 2012
Density-Difference Estimation
NIPS 2012
Fast Learning Rate of Multiple Kernel Learning: Trade-Off between Sparsity and Smoothness
AISTATS 2012
Relative Density-Ratio Estimation for Robust Distribution Comparison
NIPS 2011
Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning
NIPS 2011
Statistical Performance of Convex Tensor Decomposition
NIPS 2011
Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation
JMLR 2011
Conditional Density Estimation via Least-Squares Density Ratio Estimation
AISTATS 2010
Sufficient Dimension Reduction via Squared-loss Mutual Information Estimation
AISTATS 2010