Xu Tan
102 papers · 2018–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (13) πΊοΈ Taxonomy Completionist (14) π Interdisciplinary Bridge π Academic Marathon (7)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(14)
π§
Keyword Pioneer
π€
Dynamic Duo
(57)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(28)
π¬
Deep Specialist
(21)
π§¬
Topic Evolution
π
Keyword Champion
(4)
β‘
Prolific Year
(14)
β
The Questioner
ποΈ
Keyword Collector
(363)
π
Century Club
(99)
π₯
Unstoppable
(8)
π
Trend Setter
π
Conference Pioneer
Conferences
ACL (16)
NIPS (16)
AAAI (13)
ICLR (11)
IJCAI (10)
INTERSPEECH (9)
EMNLP (7)
ICML (7)
NAACL (6)
IJCNLP (3)
ICCV (2)
COLING (1)
CVPR (1)
Top co-authors
Keywords
neural machine translation
(19)
speech synthesis
(8)
music generation
(8)
automatic speech recognition
(8)
diffusion model
(7)
large language model
(7)
knowledge distillation
(7)
text generation
(7)
language modeling
(6)
transfer learning
(6)
text to speech
(6)
non-autoregressive translation
(5)
attention mechanism
(5)
machine translation
(5)
non-autoregressive generation
(5)
sequence generation
(4)
contrastive learning
(4)
word error rate
(4)
autonomous agent
(4)
error correction
(4)
Papers
UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization
ACL 2026
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
ACL 2026
Think Then Rewrite: Reasoning Enhanced Query Rewriting for Domain Specific Retrieval
AAAI 2026
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
NAACL 2025
MuPT: A Generative Symbolic Music Pretrained Transformer
ICLR 2025
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
NAACL 2025
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
ICCV 2025
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
NAACL 2025
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
CVPR 2025
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
AAAI 2025
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
AAAI 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ICML 2025
GETMusic: Generating Music Tracks with a Unified Representation and Diffusion Framework
IJCAI 2025
PromptTTS 2: Describing and Generating Voices with Text Prompt
ICLR 2024
Regeneration Learning: A Learning Paradigm for Data Generation
AAAI 2024
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
ICLR 2024
GAIA: Zero-shot Talking Avatar Generation
ICLR 2024
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
ICLR 2024
PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain
EMNLP 2024
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training
ACL 2024
Empowering Diffusion Models on the Embedding Space for Text Generation
NAACL 2024
Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation
IJCAI 2024
TaskBench: Benchmarking Large Language Models for Task Automation
NIPS 2024
Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning
NIPS 2024
UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner
NIPS 2024
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
NIPS 2024
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
IJCAI 2024
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
IJCAI 2024
UniAudio: Towards Universal Audio Generation with Large Language Models
ICML 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
ICML 2024
HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details
ICCV 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
NIPS 2023
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
NIPS 2023
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition
AAAI 2023
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
AAAI 2023
DiffusionNER: Boundary Diffusion for Named Entity Recognition
ACL 2023
Towards Understanding Omission in Dialogue Summarization
ACL 2023
Extract and Attend: Improving Entity Translation in Neural Machine Translation
ACL 2023
TranSFormer: Slow-Fast Transformer for Machine Translation
ACL 2023
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
EMNLP 2023
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation
IJCAI 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
INTERSPEECH 2023
Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling
NIPS 2022
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior
ICLR 2022
ProphetChat: Enhancing Dialogue Generation with Simulation of Future Conversation
ACL 2022
A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation
NAACL 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
INTERSPEECH 2022
Adaptive Logit Adjustment Loss for Long-Tailed Visual Recognition
AAAI 2022
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
NIPS 2022
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
INTERSPEECH 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
INTERSPEECH 2022
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
NIPS 2022
Analyzing and Mitigating Interference in Neural Architecture Search
ICML 2022
Non-Autoregressive Sequence Generation
ACL 2022
Revisiting Over-Smoothness in Text to Speech
ACL 2022
Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction
EMNLP 2022
TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method
EMNLP 2022
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
IJCNLP 2021
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
EMNLP 2021
A Survey on Low-Resource Neural Machine Translation
IJCAI 2021
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
NIPS 2021
DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling
ACL 2021
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
ACL 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
ICLR 2021
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
ICLR 2021
BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
ICLR 2021
UWSpeech: Speech to Speech Translation for Unwritten Languages
AAAI 2021
Adaptive Text to Speech for Spontaneous Style
INTERSPEECH 2021
SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint
AAAI 2021
Speech-T: Transducer for Text to Speech and Beyond
NIPS 2021
Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching
INTERSPEECH 2021
DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling
IJCNLP 2021
Semi-Supervised Neural Architecture Search
NIPS 2020
SimulSpeech: End-to-End Simultaneous Speech to Text Translation
ACL 2020
A Study of Non-autoregressive Model for Sequence Generation
ACL 2020
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation
AAAI 2020
Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation
IJCAI 2020
Neural Machine Translation with Error Correction
IJCAI 2020
XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System
INTERSPEECH 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
INTERSPEECH 2020
MPNet: Masked and Permuted Pre-training for Language Understanding
NIPS 2020
FastSpeech: Fast, Robust and Controllable Text to Speech
NIPS 2019
Deliberation Learning for Image-to-Image Translation
IJCAI 2019
Unsupervised Pivot Translation for Distant Languages
ACL 2019
Microsoft Research Asiaβs Systems for WMT19
ACL 2019
Multilingual Neural Machine Translation with Knowledge Distillation
ICLR 2019
Representation Degeneration Problem in Training Natural Language Generation Models
ICLR 2019
Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input
AAAI 2019
Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder
AAAI 2019
Multilingual Neural Machine Translation with Language Clustering
IJCNLP 2019
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
INTERSPEECH 2019
Multilingual Neural Machine Translation with Language Clustering
EMNLP 2019
Sentence-Wise Smooth Regularization for Sequence to Sequence Learning
AAAI 2019
MASS: Masked Sequence to Sequence Pre-training for Language Generation
ICML 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition
ICML 2019
Progressive Blockwise Knowledge Distillation for Neural Network Acceleration
IJCAI 2018
Model-Level Dual Learning
ICML 2018
Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation
NIPS 2018
Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter
EMNLP 2018
FRAGE: Frequency-Agnostic Word Representation
NIPS 2018
Dense Information Flow for Neural Machine Translation
NAACL 2018
Double Path Networks for Sequence to Sequence Learning
COLING 2018