Taiji Suzuki

112 papers · 2010–2026 · 9 conferences · across top CS/AI conferences

Achievements

+19 more ↓

🗺️ Taxonomy Completionist (32) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌟 Keyword Trendsetter Combo (5) 🏠 Conference Loyalist (28) 🐺 Lone Wolf (8) 🤝 Dynamic Duo (25) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (22) 🏆 Keyword Champion (2) 🌱 Topic Pioneer ❓ The Questioner (2) 🔥 Unstoppable (16) 🚀 Conference Pioneer 💎 Century Club (111) 📈 Trend Setter ⚡ Prolific Year (12) 🗃️ Keyword Collector (110)

Conferences

ICML (32) NIPS (28) ICLR (25) AISTATS (16) COLT (4) ACML (2) IJCAI (2) JMLR (2) AAAI (1)

Top co-authors

Atsushi Nitanda (25) Denny Wu (21) Kazusato Oko (14) Wei Huang (10) Tomoya Murata (9) Masashi Sugiyama (7) Andi Han (6) Juno Kim (6) Shunta Akiyama (5) Takafumi Kanamori (5)

Research topics

Differential Privacy (1)

Keywords

neural network (10) gradient descent (9) reproducing kernel hilbert space (6) stochastic gradient descent (6) generalization bound (6) kernel methods (6) convex optimization (5) neural network optimization (5) representation learning (5) density ratio estimation (4) sample complexity (4) tensor decomposition (4) gradient boosting (4) binary classification (4) variance reduction (4) global convergence (3) convergence analysis (3) stochastic gradient (3) in-context learning (3) feature learning (3)

Papers

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD AAAI 2026 Nonlinear transformers can perform inference-time feature learning ICML 2025 Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation ICML 2025 Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression ICLR 2025 Clustered Invariant Risk Minimization AISTATS 2025 Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons AISTATS 2025 Provable In-Context Vector Arithmetic via Retrieving Task Concepts ICML 2025 Direct Distributional Optimization for Provable Alignment of Diffusion Models ICLR 2025 Flow matching achieves almost minimax optimal convergence ICLR 2025 On the Role of Label Noise in the Feature Learning Process ICML 2025 Quantifying Memory Utilization with Effective State-Size ICML 2025 Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble ICML 2025 Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric ICLR 2025 Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models ICML 2025 Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning ICML 2025 On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent ICLR 2025 Transformers Provably Solve Parity Efficiently with Chain of Thought ICLR 2025 State Space Models are Provably Comparable to Transformers in Dynamic Token Selection ICLR 2025 Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems ICLR 2024 Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data ICLR 2024 Minimax optimality of convolutional neural networks for infinite dimensional input-output problems and separation from kernel methods ICLR 2024 Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime ICLR 2024 Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit NIPS 2024 Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning NIPS 2024 Pretrained Transformer Efficiently Learns Low-Dimensional Target Functions In-Context NIPS 2024 On the Comparison between Multi-modal and Single-modal Contrastive Learning NIPS 2024 Transformers are Minimax Optimal Nonparametric In-Context Learners NIPS 2024 Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization NIPS 2024 Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective ICML 2024 Koopman-based generalization bound: New aspect for full-rank weights ICLR 2024 How do Transformers Perform In-Context Autoregressive Learning ? ICML 2024 Mechanistic Design and Scaling of Hybrid Architectures ICML 2024 Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning ICML 2024 Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations COLT 2024 State-Free Inference of State-Space Models: The *Transfer Function* Approach ICML 2024 SILVER: Single-loop variance reduction and application to federated learning ICML 2024 Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape ICML 2024 High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization ICML 2024 Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples ICML 2024 Understanding Convergence and Generalization in Federated Learning through Feature Learning Theory ICLR 2024 Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics ICLR 2023 Convergence of mean-field Langevin dynamics: time-space discretization, stochastic gradient, and variance reduction NIPS 2023 Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective NIPS 2023 Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond NIPS 2023 Gradient-Based Feature Learning under Structured Data NIPS 2023 Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input ICML 2023 Tight and fast generalization error bound of graph embedding in metric space ICML 2023 Diffusion Models are Minimax Optimal Distribution Estimators ICML 2023 Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems ICML 2023 DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning ICML 2023 Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods ICLR 2023 Convex Analysis of the Mean Field Langevin Dynamics AISTATS 2022 Layer-wise Adaptive Graph Convolution Networks Using Generalized Pagerank ACML 2022 High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation NIPS 2022 Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime NIPS 2022 Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization NIPS 2022 Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning NIPS 2022 Understanding the Variance Collapse of SVGD in High Dimensions ICLR 2022 Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization ICLR 2022 Learnability of convolutional neural networks for infinite dimensional input via mixed and anisotropic smoothness ICLR 2022 Dimension-free convergence rates for gradient Langevin dynamics in RKHS COLT 2022 Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods ICLR 2021 Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space NIPS 2021 Differentiable Multiple Shooting Layers NIPS 2021 Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis NIPS 2021 Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features AISTATS 2021 Gradient Descent in RKHS with Importance Labeling AISTATS 2021 When does preconditioning help or hurt generalization? ICLR 2021 Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime ICLR 2021 On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting ICML 2021 Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning ICML 2021 Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding ICML 2021 Decomposable-Net: Scalable Low-Rank Compression for Neural Networks IJCAI 2021 Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint ICLR 2020 Functional Gradient Boosting for Learning Residual-like Networks with Statistical Guarantees AISTATS 2020 Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network ICLR 2020 Graph Neural Networks Exponentially Lose Expressive Power for Node Classification ICLR 2020 Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error IJCAI 2020 Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks NIPS 2020 Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics NIPS 2020 Understanding Generalization in Deep Learning via Tensor Methods AISTATS 2020 Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality ICLR 2019 Approximation and non-parametric estimation of ResNet-type convolutional neural networks ICML 2019 Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors AISTATS 2019 Asian Conference on Machine Learning: Preface ACML 2019 Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models AISTATS 2018 Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables AISTATS 2018 Functional Gradient Boosting based on Residual Network Perception ICML 2018 Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation NIPS 2018 Fast generalization error bound of deep learning from a kernel perspective AISTATS 2018 Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines AISTATS 2017 Trimmed Density Ratio Estimation NIPS 2017 Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization NIPS 2017 Gaussian process nonparametric tensor estimator and its minimax optimality ICML 2016 Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning NIPS 2016 Structure Learning of Partitioned Markov Networks ICML 2016 A Consistent Method for Graph Based Anomaly Localization AISTATS 2015 Convergence rate of Bayesian tensor estimator and its minimax optimality ICML 2015 Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers ICML 2014 Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems JMLR 2013 Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method ICML 2013 Convex Tensor Decomposition via Structured Schatten Norm Regularization NIPS 2013 A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems COLT 2012 PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model COLT 2012 Density-Difference Estimation NIPS 2012 Fast Learning Rate of Multiple Kernel Learning: Trade-Off between Sparsity and Smoothness AISTATS 2012 Relative Density-Ratio Estimation for Robust Distribution Comparison NIPS 2011 Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning NIPS 2011 Statistical Performance of Convex Tensor Decomposition NIPS 2011 Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation JMLR 2011 Conditional Density Estimation via Least-Squares Density Ratio Estimation AISTATS 2010 Sufficient Dimension Reduction via Squared-loss Mutual Information Estimation AISTATS 2010