conftrace_

Xu Tan

102 papers · 2018–2026 · 13 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+17 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (13) 🗺️ Taxonomy Completionist (14) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (7)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (14) 🧭 Keyword Pioneer 🤝 Dynamic Duo (57) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (28) 🔬 Deep Specialist (21) 🧬 Topic Evolution 🏆 Keyword Champion (4) ⚡ Prolific Year (14) ❓ The Questioner 🗃️ Keyword Collector (363) 💎 Century Club (99) 🔥 Unstoppable (8) 📈 Trend Setter 🚀 Conference Pioneer

Conferences

ACL (16) NIPS (16) AAAI (13) ICLR (11) IJCAI (10) INTERSPEECH (9) EMNLP (7) ICML (7) NAACL (6) IJCNLP (3) ICCV (2) COLING (1) CVPR (1)

Top co-authors

Tao Qin (57) Tie-yan Liu (47) Kaitao Song (22) Sheng Zhao (21) Yichong Leng (16) Jiang Bian (15) Di He (14) Rui Wang (13) Yi Ren (12) Junliang Guo (12)

Keywords

neural machine translation (19) speech synthesis (8) music generation (8) automatic speech recognition (8) diffusion model (7) large language model (7) knowledge distillation (7) text generation (7) language modeling (6) transfer learning (6) text to speech (6) non-autoregressive translation (5) attention mechanism (5) machine translation (5) non-autoregressive generation (5) sequence generation (4) contrastive learning (4) word error rate (4) autonomous agent (4) error correction (4)

Papers

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization ACL 2026 Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training ACL 2026 Think Then Rewrite: Reasoning Enhanced Query Rewriting for Domain Specific Retrieval AAAI 2026 EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms NAACL 2025 MuPT: A Generative Symbolic Music Pretrained Transformer ICLR 2025 EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction NAACL 2025 The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation ICCV 2025 CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models NAACL 2025 VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling CVPR 2025 InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation AAAI 2025 Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model AAAI 2025 ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling ICML 2025 GETMusic: Generating Music Tracks with a Unified Representation and Diffusion Framework IJCAI 2025 PromptTTS 2: Describing and Generating Voices with Text Prompt ICLR 2024 Regeneration Learning: A Learning Paradigm for Data Generation AAAI 2024 Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers ICLR 2024 GAIA: Zero-shot Talking Avatar Generation ICLR 2024 NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers ICLR 2024 PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain EMNLP 2024 Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training ACL 2024 Empowering Diffusion Models on the Embedding Space for Text Generation NAACL 2024 Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation IJCAI 2024 TaskBench: Benchmarking Large Language Models for Task Automation NIPS 2024 Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning NIPS 2024 UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner NIPS 2024 D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models NIPS 2024 FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation IJCAI 2024 Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation IJCAI 2024 UniAudio: Towards Universal Audio Generation with Large Language Models ICML 2024 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models ICML 2024 HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details ICCV 2023 HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face NIPS 2023 AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models NIPS 2023 SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition AAAI 2023 VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing AAAI 2023 DiffusionNER: Boundary Diffusion for Named Entity Recognition ACL 2023 Towards Understanding Omission in Dialogue Summarization ACL 2023 Extract and Attend: Improving Entity Translation in Neural Machine Translation ACL 2023 TranSFormer: Slow-Fast Transformer for Machine Translation ACL 2023 MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models EMNLP 2023 NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation IJCAI 2023 ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading INTERSPEECH 2023 Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling NIPS 2022 PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior ICLR 2022 ProphetChat: Enhancing Dialogue Generation with Simulation of Future Conversation ACL 2022 A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation NAACL 2022 AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios INTERSPEECH 2022 Adaptive Logit Adjustment Loss for Long-Tailed Visual Recognition AAAI 2022 Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation NIPS 2022 DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders INTERSPEECH 2022 Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech INTERSPEECH 2022 BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis NIPS 2022 Analyzing and Mitigating Interference in Neural Architecture Search ICML 2022 Non-Autoregressive Sequence Generation ACL 2022 Revisiting Over-Smoothness in Text to Speech ACL 2022 Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction EMNLP 2022 TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method EMNLP 2022 MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training IJCNLP 2021 FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition EMNLP 2021 A Survey on Low-Resource Neural Machine Translation IJCAI 2021 FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition NIPS 2021 DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling ACL 2021 MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training ACL 2021 AdaSpeech: Adaptive Text to Speech for Custom Voice ICLR 2021 FastSpeech 2: Fast and High-Quality End-to-End Text to Speech ICLR 2021 BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction ICLR 2021 UWSpeech: Speech to Speech Translation for Unwritten Languages AAAI 2021 Adaptive Text to Speech for Spontaneous Style INTERSPEECH 2021 SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint AAAI 2021 Speech-T: Transducer for Text to Speech and Beyond NIPS 2021 Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching INTERSPEECH 2021 DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling IJCNLP 2021 Semi-Supervised Neural Architecture Search NIPS 2020 SimulSpeech: End-to-End Simultaneous Speech to Text Translation ACL 2020 A Study of Non-autoregressive Model for Sequence Generation ACL 2020 Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation AAAI 2020 Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation IJCAI 2020 Neural Machine Translation with Error Correction IJCAI 2020 XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System INTERSPEECH 2020 MultiSpeech: Multi-Speaker Text to Speech with Transformer INTERSPEECH 2020 MPNet: Masked and Permuted Pre-training for Language Understanding NIPS 2020 FastSpeech: Fast, Robust and Controllable Text to Speech NIPS 2019 Deliberation Learning for Image-to-Image Translation IJCAI 2019 Unsupervised Pivot Translation for Distant Languages ACL 2019 Microsoft Research Asia’s Systems for WMT19 ACL 2019 Multilingual Neural Machine Translation with Knowledge Distillation ICLR 2019 Representation Degeneration Problem in Training Natural Language Generation Models ICLR 2019 Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input AAAI 2019 Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder AAAI 2019 Multilingual Neural Machine Translation with Language Clustering IJCNLP 2019 Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion INTERSPEECH 2019 Multilingual Neural Machine Translation with Language Clustering EMNLP 2019 Sentence-Wise Smooth Regularization for Sequence to Sequence Learning AAAI 2019 MASS: Masked Sequence to Sequence Pre-training for Language Generation ICML 2019 Almost Unsupervised Text to Speech and Automatic Speech Recognition ICML 2019 Progressive Blockwise Knowledge Distillation for Neural Network Acceleration IJCAI 2018 Model-Level Dual Learning ICML 2018 Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation NIPS 2018 Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter EMNLP 2018 FRAGE: Frequency-Agnostic Word Representation NIPS 2018 Dense Information Flow for Neural Machine Translation NAACL 2018 Double Path Networks for Sequence to Sequence Learning COLING 2018