Dongchao Yang

24 papers · 2021–2026 · 7 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (9) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (5)

🧭 Keyword Pioneer 🌈 Renaissance Researcher (7) 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (10) 👑 Triple Crown 🏆 Grand Slam 🧬 Topic Evolution 🏆 Keyword Champion (2) 🗃️ Keyword Collector (97) ⚡ Prolific Year (9) ❓ The Questioner 🔥 Unstoppable (5) 💎 Century Club (22)

Conferences

INTERSPEECH (10) ICML (5) ACL (4) AAAI (2) EMNLP (1) ICLR (1) NIPS (1)

Top co-authors

Yuexian Zou (10) Xixin Wu (7) Rongjie Huang (6) Xu Tan (5) Helin Wang (5) Zhou Zhao (5) Xueyuan Chen (4) Zhenhui Ye (4) Helen Meng (4) Haohan Guo (4)

Keywords

diffusion model (3) large language model (3) weakly supervised learning (3) multimodal learning (3) neural network (2) speech synthesis (2) unsupervised learning (2) speech tokenization (2) speaker extraction (2) speech large language model (2) audio source separation (2) sound event detection (2) speech processing (2) domain adaptation (2) knowledge distillation (1) few-shot learning (1) self-supervised learning (1) embedding learning (1) semi-supervised learning (1) contrastive learning (1)

Papers

UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment ACL 2026 DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models AAAI 2026 InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training ACL 2025 Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs EMNLP 2025 ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors ACL 2025 ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling ICML 2025 UniAudio: Towards Universal Audio Generation with Large Language Models ICML 2024 UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner NIPS 2024 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head AAAI 2024 Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners ACL 2024 PromptTTS 2: Describing and Generating Voices with Text Prompt ICLR 2024 InstructSpeech: Following Speech Editing Instructions via Large Language Models ICML 2024 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models ICML 2024 CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction INTERSPEECH 2024 SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models INTERSPEECH 2024 Background-aware Modeling for Weakly Supervised Sound Event Detection INTERSPEECH 2023 NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS INTERSPEECH 2023 Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models ICML 2023 Improving Target Sound Extraction with Timestamp Information INTERSPEECH 2022 Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction INTERSPEECH 2022 Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches INTERSPEECH 2022 Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification INTERSPEECH 2022 RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection INTERSPEECH 2022 Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification INTERSPEECH 2021