Lei He

48 papers · 2014–2026 · 14 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🗺️ Taxonomy Completionist (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (14)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (12) 🧭 Keyword Pioneer 🏠 Conference Loyalist (22) 🤝 Dynamic Duo (12) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (12) 🧬 Topic Evolution 🏆 Keyword Champion ⚡ Prolific Year (5) 🗃️ Keyword Collector (210) 💎 Century Club (45) 🔥 Unstoppable (8) 🚀 Conference Pioneer

Conferences

INTERSPEECH (22) AAAI (7) NIPS (5) COLING (2) EMNLP (2) ICLR (2) ACL (1) ACML (1) CVPR (1) ECCV (1) ICML (1) MICCAI (1) NAACL (1) WACV (1)

Top co-authors

Sheng Zhao (12) Xu Tan (9) Frank K. Soong (7) Xi Wang (7) Jinyu Li (5) Tao Qin (5) Jiang Bian (5) Yuan Liang (4) Yichong Leng (4) Zeqian Ju (4)

Keywords

speech synthesis (10) text-to-speech synthesis (7) neural vocoder (5) domain adaptation (3) contrastive learning (3) recurrent neural network transducer (3) automatic speech recognition (3) autoregressive model (3) end-to-end model (2) cooperative perception (2) transfer learning (2) language model (2) end-to-end learning (2) speech generation (2) attention mechanism (2) flow matching (2) representation learning (2) self-supervised learning (2) knowledge graph (2) object detection (1)

Papers

Mixture-of-Trees: Learning to Select and Weigh Reasoning Paths for Efficient LLM Inference AAAI 2026 Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark AAAI 2026 SparseCoop: Cooperative Perception with Kinematic-Grounded Queries AAAI 2026 USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation AAAI 2025 SensorFlow: Sensor and Image Fused Video Stabilization WACV 2025 Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation AAAI 2025 PodAgent: A Comprehensive Framework for Podcast Generation ACL 2025 NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers ICLR 2024 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models ICML 2024 Masked Residual Diffusion Probabilistic Model with Regional Asymmetry Prior for Generating Perfusion Maps from Multi-phase CTA MICCAI 2024 Temporal Co-Registration of Simultaneous Electromagnetic Articulography and Electroencephalography for Precise Articulatory and Neural Data Alignment INTERSPEECH 2024 CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations NIPS 2024 PromptTTS 2: Describing and Generating Voices with Text Prompt ICLR 2024 ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading INTERSPEECH 2023 KEPL: Knowledge Enhanced Prompt Learning for Chinese Hypernym-Hyponym Extraction EMNLP 2023 AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models NIPS 2023 VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing AAAI 2023 Large-Scale Automatic Audiobook Creation INTERSPEECH 2023 SoftSpeech: Unsupervised Duration Model in FastSpeech 2 INTERSPEECH 2022 Idiosyncratic lingual articulation of American English /æ/ and /ɑ/ using network analysis INTERSPEECH 2022 Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge INTERSPEECH 2022 BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis NIPS 2022 ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images ECCV 2022 TreeMoCo: Contrastive Neuron Morphology Representation Learning NIPS 2022 Self-supervised Context-aware Style Representation for Expressive Speech Synthesis INTERSPEECH 2022 AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios INTERSPEECH 2022 DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders INTERSPEECH 2022 Exploring Forensic Dental Identification with Deep Learning NIPS 2021 Oral-3D: Reconstructing the 3D Structure of Oral Cavity from Panoramic X-ray AAAI 2021 KLMo: Knowledge Graph Enhanced Pretrained Language Model with Fine-Grained Relationships EMNLP 2021 Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS INTERSPEECH 2021 Cross-Speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis INTERSPEECH 2021 An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis INTERSPEECH 2020 Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability INTERSPEECH 2020 Towards Universal Text-to-Speech INTERSPEECH 2020 Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator INTERSPEECH 2020 Atlas-aware ConvNet for Accurate yet Robust Anatomical Segmentation ACML 2020 Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS INTERSPEECH 2019 A New GAN-Based End-to-End TTS Training Algorithm INTERSPEECH 2019 Forward-Backward Decoding for Regularizing End-to-End TTS INTERSPEECH 2019 Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS INTERSPEECH 2019 Influences of Fundamental Oscillation on Speaker Identification in Vocalic Utterances by Humans and Computers INTERSPEECH 2018 A New Glottal Neural Vocoder for Speech Synthesis INTERSPEECH 2018 Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network NAACL 2016 Exploring Differential Topic Models for Comparative Summarization of Scientific Papers COLING 2016 A Praat-Based Algorithm to Extract the Amplitude Envelope and Temporal Fine Structure Using the Hilbert Transform INTERSPEECH 2016 Abstractive News Summarization based on Event Semantic Link Network COLING 2016 Preconditioning for Accelerated Iteratively Reweighted Least Squares in Structured Sparsity Reconstruction CVPR 2014