Zhenhui Ye
17 papers · 2022–2025 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Conference Polyglot (6) π Cross-Pollinator (10) π Interdisciplinary Bridge π§ Keyword Pioneer π Renaissance Researcher (8)
πΊοΈ
Taxonomy Completionist
(26)
π£
Hot Topic Early Bird
π
Grand Slam
π
Triple Crown
π€
Dynamic Duo
(17)
π
Century Club
(17)
ποΈ
Keyword Collector
(68)
β‘
Prolific Year
(8)
Conferences
ACL (7)
ICLR (3)
ICML (3)
NIPS (2)
AAAI (1)
IJCAI (1)
Top co-authors
Keywords
speech synthesis
(4)
diffusion model
(3)
prosody modeling
(2)
multimodal learning
(2)
audio generation
(2)
contrastive learning
(2)
text-to-audio generation
(2)
multitask learning
(1)
talking face generation
(1)
zero-shot learning
(1)
multi-modal learning
(1)
semantic alignment
(1)
cross-modal learning
(1)
speech enhancement
(1)
facial animation
(1)
preference learning
(1)
generative model
(1)
reinforcement learning from human feedback
(1)
cross-modal alignment
(1)
prosody prediction
(1)
Papers
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
ACL 2025
Extending Multi-modal Contrastive Representations
NIPS 2024
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
AAAI 2024
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
NIPS 2024
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners
ACL 2024
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
ICLR 2024
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
ICLR 2024
InstructSpeech: Following Speech Editing Instructions via Large Language Models
ICML 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
ICML 2024
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
ACL 2023
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis
ICLR 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
ICML 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training
ACL 2023
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
ACL 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
ACL 2023
DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect
ACL 2023
SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech
IJCAI 2022