Dongchao Yang
24 papers · 2021–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (9) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (5)
🧭
Keyword Pioneer
🌈
Renaissance Researcher
(7)
🐣
Hot Topic Early Bird
🤝
Dynamic Duo
(10)
👑
Triple Crown
🏆
Grand Slam
🧬
Topic Evolution
🏆
Keyword Champion
(2)
🗃️
Keyword Collector
(97)
⚡
Prolific Year
(9)
❓
The Questioner
🔥
Unstoppable
(5)
💎
Century Club
(22)
Conferences
INTERSPEECH (10)
ICML (5)
ACL (4)
AAAI (2)
EMNLP (1)
ICLR (1)
NIPS (1)
Top co-authors
Keywords
diffusion model
(3)
large language model
(3)
weakly supervised learning
(3)
multimodal learning
(3)
neural network
(2)
speech synthesis
(2)
unsupervised learning
(2)
speech tokenization
(2)
speaker extraction
(2)
speech large language model
(2)
audio source separation
(2)
sound event detection
(2)
speech processing
(2)
domain adaptation
(2)
knowledge distillation
(1)
few-shot learning
(1)
self-supervised learning
(1)
embedding learning
(1)
semi-supervised learning
(1)
contrastive learning
(1)
Papers
UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment
ACL 2026
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
AAAI 2026
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
ACL 2025
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
EMNLP 2025
ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
ACL 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ICML 2025
UniAudio: Towards Universal Audio Generation with Large Language Models
ICML 2024
UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner
NIPS 2024
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
AAAI 2024
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners
ACL 2024
PromptTTS 2: Describing and Generating Voices with Text Prompt
ICLR 2024
InstructSpeech: Following Speech Editing Instructions via Large Language Models
ICML 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
ICML 2024
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
INTERSPEECH 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
INTERSPEECH 2024
Background-aware Modeling for Weakly Supervised Sound Event Detection
INTERSPEECH 2023
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
INTERSPEECH 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
ICML 2023
Improving Target Sound Extraction with Timestamp Information
INTERSPEECH 2022
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction
INTERSPEECH 2022
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches
INTERSPEECH 2022
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification
INTERSPEECH 2022
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection
INTERSPEECH 2022
Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification
INTERSPEECH 2021