Sheng Zhao

37 papers · 2019–2025 · 9 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (9) 🗺️ Taxonomy Completionist (15) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6)

🗺️ Taxonomy Completionist (15) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🏆 Grand Slam 👑 Triple Crown 🏆 Keyword Champion 🤝 Dynamic Duo (21) 🔬 Deep Specialist (13) 💎 Century Club (37) 🚀 Conference Pioneer ⚡ Prolific Year (5) 🗃️ Keyword Collector (142) 🔥 Unstoppable (7)

Conferences

INTERSPEECH (16) ICLR (5) NIPS (5) AAAI (3) ICML (3) ACL (2) ICCV (1) IJCAI (1) MICCAI (1)

Top co-authors

Xu Tan (21) Tao Qin (13) Lei He (12) Yanqing Liu (11) Tie-yan Liu (9) Jinyu Li (9) Jiang Bian (8) Shujie LIU (7) Zhou Zhao (5) Yi Ren (5)

Keywords

speech synthesis (10) automatic speech recognition (7) text to speech (4) text-to-speech synthesis (4) knowledge distillation (3) end-to-end model (3) machine translation (2) speaker similarity (2) neural network (2) neural vocoder (2) transformer architecture (2) end-to-end speech recognition (2) flow matching (2) parallel generation (2) speech recognition (2) representation learning (2) autoregressive model (2) adversarial training (2) non-autoregressive model (2) data augmentation (1)

Papers

Medical-Knowledge Driven Multiple Instance Learning for Classifying Severe Abdominal Anomalies on Prenatal Ultrasound MICCAI 2025 Autoregressive Speech Synthesis without Vector Quantization ACL 2025 CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations NIPS 2024 UniAudio: Towards Universal Audio Generation with Large Language Models ICML 2024 TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation NIPS 2024 GAIA: Zero-shot Talking Avatar Generation ICLR 2024 NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers ICLR 2024 PromptTTS 2: Describing and Generating Voices with Text Prompt ICLR 2024 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models ICML 2024 An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS INTERSPEECH 2024 Total-Duration-Aware Duration Modeling for Text-to-Speech Systems INTERSPEECH 2024 HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details ICCV 2023 Large-Scale Automatic Audiobook Creation INTERSPEECH 2023 AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models NIPS 2023 VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing AAAI 2023 ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading INTERSPEECH 2023 DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders INTERSPEECH 2022 Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech INTERSPEECH 2022 AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios INTERSPEECH 2022 BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis NIPS 2022 RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion INTERSPEECH 2022 AdaSpeech: Adaptive Text to Speech for Custom Voice ICLR 2021 FastSpeech 2: Fast and High-Quality End-to-End Text to Speech ICLR 2021 A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems INTERSPEECH 2021 Adaptive Text to Speech for Spontaneous Style INTERSPEECH 2021 MultiSpeech: Multi-Speaker Text to Speech with Transformer INTERSPEECH 2020 Semantic Mask for Transformer Based End-to-End Speech Recognition INTERSPEECH 2020 RobuTrans: A Robust Transformer-Based Text-to-Speech Model AAAI 2020 A Study of Non-autoregressive Model for Sequence Generation ACL 2020 Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability INTERSPEECH 2020 MoBoAligner: A Neural Alignment Model for Non-Autoregressive TTS with Monotonic Boundary Search INTERSPEECH 2020 Enhancing Monotonicity for Robust Autoregressive Transformer TTS INTERSPEECH 2020 FastSpeech: Fast, Robust and Controllable Text to Speech NIPS 2019 Almost Unsupervised Text to Speech and Automatic Speech Recognition ICML 2019 Neural Speech Synthesis with Transformer Network AAAI 2019 Towards Discriminative Representation Learning for Speech Emotion Recognition IJCAI 2019 Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion INTERSPEECH 2019