Ziyue Jiang

27 papers · 2020–2025 · 9 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🗺️ Taxonomy Completionist (11) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (9)

🗺️ Taxonomy Completionist (11) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🏆 Grand Slam 👑 Triple Crown 🏆 Keyword Champion (2) 🤝 Dynamic Duo (24) 🔬 Deep Specialist (11) 🧬 Topic Evolution 🗃️ Keyword Collector (105) ⚡ Prolific Year (11) 🔥 Unstoppable (6) 💎 Century Club (27)

Conferences

ACL (10) ICLR (4) EMNLP (3) INTERSPEECH (3) NIPS (3) AAAI (1) COLING (1) ICML (1) IJCAI (1)

Top co-authors

Zhou Zhao (24) Rongjie Huang (10) Jialong Zuo (9) Yi Ren (9) Zhenhui Ye (8) Shengpeng Ji (8) Qian Yang (7) Jinzheng He (7) Jinglin Liu (7) Minghui Fang (6)

Keywords

speech synthesis (12) zero-shot learning (6) flow matching (3) singing voice synthesis (3) vector quantization (3) prosody modeling (3) style transfer (3) multimodal learning (2) speaker cloning (2) style control (2) speech generation (2) speech recognition (2) contrastive learning (2) diffusion model (2) voice conversion (1) benchmark evaluation (1) talking face generation (1) attention mechanism (1) neural decoding (1) self-supervised learning (1)

Papers

BrainLoc: Brain Signal-Based Object Detection with Multi-modal Alignment EMNLP 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 Language-Codec: Bridging Discrete Codec Representations and Speech Language Models ACL 2025 Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching ACL 2025 TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis ACL 2025 VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation COLING 2025 Versatile Framework for Song Generation with Prompt-based Control EMNLP 2025 Speech Watermarking with Discrete Intermediate Representations AAAI 2025 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency INTERSPEECH 2024 GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks NIPS 2024 MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes NIPS 2024 AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension ACL 2024 Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners ACL 2024 MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech ACL 2024 TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control EMNLP 2024 Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis ICLR 2024 Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis ICLR 2024 InstructSpeech: Following Speech Editing Instructions via Large Language Models ICML 2024 MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis INTERSPEECH 2024 FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models ACL 2023 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training ACL 2023 FastDiff 2: Revisiting and Incorporating GANs and Diffusion Models in High-Fidelity Speech Synthesis ACL 2023 GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis ICLR 2023 Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech NIPS 2022 FedSpeech: Federated Text-to-Speech with Continual Learning IJCAI 2021 Self-Supervised Spoofing Audio Detection Scheme INTERSPEECH 2020