Sheng Zhao
37 papers · 2019–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (9) πΊοΈ Taxonomy Completionist (15) π Interdisciplinary Bridge π Academic Marathon (6)
πΊοΈ
Taxonomy Completionist
(15)
π£
Hot Topic Early Bird
π§
Keyword Pioneer
π
Grand Slam
π
Triple Crown
π
Keyword Champion
π€
Dynamic Duo
(21)
π¬
Deep Specialist
(13)
π
Century Club
(37)
π
Conference Pioneer
β‘
Prolific Year
(5)
ποΈ
Keyword Collector
(142)
π₯
Unstoppable
(7)
Conferences
INTERSPEECH (16)
ICLR (5)
NIPS (5)
AAAI (3)
ICML (3)
ACL (2)
ICCV (1)
IJCAI (1)
MICCAI (1)
Top co-authors
Keywords
speech synthesis
(10)
automatic speech recognition
(7)
text to speech
(4)
text-to-speech synthesis
(4)
knowledge distillation
(3)
end-to-end model
(3)
machine translation
(2)
speaker similarity
(2)
neural network
(2)
neural vocoder
(2)
transformer architecture
(2)
end-to-end speech recognition
(2)
flow matching
(2)
parallel generation
(2)
speech recognition
(2)
representation learning
(2)
autoregressive model
(2)
adversarial training
(2)
non-autoregressive model
(2)
data augmentation
(1)
Papers
Medical-Knowledge Driven Multiple Instance Learning for Classifying Severe Abdominal Anomalies on Prenatal Ultrasound
MICCAI 2025
Autoregressive Speech Synthesis without Vector Quantization
ACL 2025
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
NIPS 2024
UniAudio: Towards Universal Audio Generation with Large Language Models
ICML 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
NIPS 2024
GAIA: Zero-shot Talking Avatar Generation
ICLR 2024
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
ICLR 2024
PromptTTS 2: Describing and Generating Voices with Text Prompt
ICLR 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
ICML 2024
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
INTERSPEECH 2024
Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
INTERSPEECH 2024
HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details
ICCV 2023
Large-Scale Automatic Audiobook Creation
INTERSPEECH 2023
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
NIPS 2023
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
AAAI 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
INTERSPEECH 2023
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
INTERSPEECH 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
INTERSPEECH 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
INTERSPEECH 2022
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
NIPS 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
INTERSPEECH 2022
AdaSpeech: Adaptive Text to Speech for Custom Voice
ICLR 2021
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
ICLR 2021
A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems
INTERSPEECH 2021
Adaptive Text to Speech for Spontaneous Style
INTERSPEECH 2021
MultiSpeech: Multi-Speaker Text to Speech with Transformer
INTERSPEECH 2020
Semantic Mask for Transformer Based End-to-End Speech Recognition
INTERSPEECH 2020
RobuTrans: A Robust Transformer-Based Text-to-Speech Model
AAAI 2020
A Study of Non-autoregressive Model for Sequence Generation
ACL 2020
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability
INTERSPEECH 2020
MoBoAligner: A Neural Alignment Model for Non-Autoregressive TTS with Monotonic Boundary Search
INTERSPEECH 2020
Enhancing Monotonicity for Robust Autoregressive Transformer TTS
INTERSPEECH 2020
FastSpeech: Fast, Robust and Controllable Text to Speech
NIPS 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition
ICML 2019
Neural Speech Synthesis with Transformer Network
AAAI 2019
Towards Discriminative Representation Learning for Speech Emotion Recognition
IJCAI 2019
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
INTERSPEECH 2019