Zhenhui Ye

17 papers · 2022–2025 · 6 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌍 Conference Polyglot (6) 🐝 Cross-Pollinator (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (8)

🗺️ Taxonomy Completionist (26) 🐣 Hot Topic Early Bird 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (17) 💎 Century Club (17) 🗃️ Keyword Collector (68) ⚡ Prolific Year (8)

Conferences

ACL (7) ICLR (3) ICML (3) NIPS (2) AAAI (1) IJCAI (1)

Top co-authors

Zhou Zhao (17) Rongjie Huang (12) Yi Ren (10) Jinglin Liu (10) Ziyue Jiang (8) Jinzheng He (7) Luping Liu (6) Zehan Wang (6) Xiang Yin (6) Xize Cheng (5)

Keywords

speech synthesis (4) diffusion model (3) prosody modeling (2) multimodal learning (2) audio generation (2) contrastive learning (2) text-to-audio generation (2) multitask learning (1) talking face generation (1) zero-shot learning (1) multi-modal learning (1) semantic alignment (1) cross-modal learning (1) speech enhancement (1) facial animation (1) preference learning (1) generative model (1) reinforcement learning from human feedback (1) cross-modal alignment (1) prosody prediction (1)

Papers

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback ACL 2025 Extending Multi-modal Contrastive Representations NIPS 2024 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head AAAI 2024 MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes NIPS 2024 Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners ACL 2024 Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis ICLR 2024 Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis ICLR 2024 InstructSpeech: Following Speech Editing Instructions via Large Language Models ICML 2024 FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion ICML 2024 AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation ACL 2023 GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis ICLR 2023 Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models ICML 2023 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training ACL 2023 RMSSinger: Realistic-Music-Score based Singing Voice Synthesis ACL 2023 FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models ACL 2023 DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect ACL 2023 SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech IJCAI 2022