Tuo Zhao
109 papers · 2012–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (30) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Academic Marathon
(13)
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Conference Loyalist
(28)
π
Keyword Trendsetter Combo
(4)
π€
Dynamic Duo
(32)
π
Triple Crown
π¬
Deep Specialist
(10)
π
Keyword Champion
(2)
π±
Topic Pioneer
ποΈ
Keyword Collector
(99)
π
Conference Pioneer
π₯
Unstoppable
(14)
β‘
Prolific Year
(17)
β
The Questioner
(3)
π
Trend Setter
π
Century Club
(108)
Conferences
NIPS (28)
ICML (20)
ICLR (13)
EMNLP (11)
JMLR (10)
ACL (8)
AISTATS (8)
NAACL (4)
IJCNLP (2)
INTERSPEECH (2)
UAI (2)
L4DC (1)
Top co-authors
Research topics
Keywords
nonconvex optimization
(10)
gradient descent
(8)
model compression
(7)
sample complexity
(6)
stochastic optimization
(6)
pre-trained language model
(5)
representation learning
(5)
neural network
(5)
function approximation
(5)
sparse learning
(5)
convex optimization
(4)
non-convex optimization
(4)
adversarial regularization
(4)
weak supervision
(3)
neural network optimization
(3)
neural machine translation
(3)
named entity recognition
(3)
reinforcement learning from human feedback
(3)
dimensionality reduction
(3)
policy gradient
(3)
Papers
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
ACL 2026
DORM: Preference Data Weights Optimization for Reward Modeling in LLM Alignment
EMNLP 2025
Deep Reinforcement Learning from Hierarchical Preference Design
ICML 2025
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
ICML 2025
RoseRAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization
ACL 2025
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
JMLR 2025
Data Diversity Matters for Robust Instruction Tuning
EMNLP 2024
BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering
EMNLP 2024
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning
EMNLP 2024
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
JMLR 2024
Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces
JMLR 2024
Beyond Point Prediction: Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process
ICML 2024
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
ICML 2024
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
ICLR 2024
LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models
ICLR 2024
Provable Acceleration of Nesterov's Accelerated Gradient for Asymmetric Matrix Factorization and Linear Neural Networks
NIPS 2024
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks
NIPS 2024
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
NIPS 2024
Robust Reinforcement Learning from Corrupted Human Feedback
NIPS 2024
Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
ICLR 2023
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
ICML 2023
Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms
NIPS 2023
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
NIPS 2023
Module-wise Adaptive Distillation for Multimodality Foundation Models
NIPS 2023
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
ICML 2023
Context-Aware Query Rewriting for Improving Usersβ Search Experience on E-commerce Websites
ACL 2023
HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference
EMNLP 2023
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories
ICML 2023
Machine Learning Force Fields with Data Cost Aware Training
ICML 2023
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
ICLR 2023
Pivotal Estimation of Linear Discriminant Analysis in High Dimensions
JMLR 2023
Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data
ICML 2023
SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process
ICML 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
ICLR 2023
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
EMNLP 2023
Reinforcement Learning for Adaptive Mesh Refinement
AISTATS 2023
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
ICML 2022
Steering vector correction in MVDR beamformer for speech enhancement
INTERSPEECH 2022
Self-Training with Differentiable Teacher
NAACL 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
NAACL 2022
On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds
NIPS 2022
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
ACL 2022
CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
NAACL 2022
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
AISTATS 2022
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
ICLR 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
ICLR 2022
Taming Sparsely Activated Transformer with Stochastic Experts
ICLR 2022
Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits
ICLR 2022
Adversarially Regularized Policy Learning Guided by Trajectory Optimization
L4DC 2022
Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
ICML 2022
Learning to Defend by Learning to Attack
AISTATS 2021
A Hypergradient Approach to Robust Regression without Correspondence
ICLR 2021
Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks
ICML 2021
How Important is the Train-Validation Split in Meta-Learning?
ICML 2021
Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
NIPS 2021
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
IJCNLP 2021
Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach
NAACL 2021
Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach
EMNLP 2021
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
EMNLP 2021
Token-wise Curriculum Learning for Neural Machine Translation
EMNLP 2021
ARCH: Efficient Adversarial Regularized Training with Caching
EMNLP 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
AISTATS 2021
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data
IJCNLP 2021
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
ACL 2021
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data
ACL 2021
On Generalization Bounds of a Family of Recurrent Neural Networks
AISTATS 2020
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
ACL 2020
Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing
ACL 2020
Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data
EMNLP 2020
On Computation and Generalization of Generative Adversarial Imitation Learning
ICLR 2020
Implicit Bias of Gradient Descent based Adversarial Training on Separable Data
ICLR 2020
Towards Understanding Hierarchical Learning: Benefits of Neural Representations
NIPS 2020
Differentiable Top-k with Optimal Transport
NIPS 2020
Deep Reinforcement Learning with Robust and Smooth Policy
ICML 2020
Transformer Hawkes Process
ICML 2020
Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective
NIPS 2020
On Computation and Generalization of Generative Adversarial Networks under Spectrum Control
ICLR 2019
Meta Learning with Relational Information for Short Sequences
NIPS 2019
Towards Understanding the Importance of Shortcut Connections in Residual Networks
NIPS 2019
Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds
NIPS 2019
On Constrained Nonconvex Stochastic Optimization: A Case Study for Generalized Eigenvalue Decomposition
AISTATS 2019
On Scalable and Efficient Computation of Large Scale Optimal Transport
ICML 2019
Toward Understanding the Importance of Noise in Training Neural Networks
ICML 2019
Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python
JMLR 2019
On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Donβt Worry About its Nonsmooth Loss Function
UAI 2019
Online Factorization and Partition of Complex Networks by Random Walk
UAI 2019
The Physical Systems Behind Optimization Algorithms
NIPS 2018
Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization
NIPS 2018
Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization
NIPS 2018
Provable Gaussian Embedding with One Observation
NIPS 2018
On Faster Convergence of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization
JMLR 2018
On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning
NIPS 2017
Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability
ICML 2017
Deep Hyperspherical Learning
NIPS 2017
The Opensesame NIST 2016 Speaker Recognition Evaluation System
INTERSPEECH 2017
Parametric Simplex Method for Sparse Learning
NIPS 2017
An Improved Convergence Analysis of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization
AISTATS 2016
NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization
NIPS 2016
Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning
ICML 2016
The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
JMLR 2015
A Nonconvex Optimization Framework for Low Rank Matrix Estimation
NIPS 2015
Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery
JMLR 2015
Multivariate Regression with Calibration
NIPS 2014
Accelerated Mini-batch Randomized Block Coordinate Descent Method
NIPS 2014
CODA: High Dimensional Copula Discriminant Analysis
JMLR 2013
Sparse Inverse Covariance Estimation with Calibration
NIPS 2013
The huge Package for High-dimensional Undirected Graph Estimation in R
JMLR 2012
Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation
NIPS 2012
Sparse Additive Machine
AISTATS 2012