Lei Xie
109 papers · 2007–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (31) π§ Keyword Pioneer π Renaissance Researcher (6) π Interdisciplinary Bridge π Conference Polyglot (11)
π
Interdisciplinary Bridge
π
Conference Polyglot
(11)
π£
Hot Topic Early Bird
π
Conference Loyalist
(83)
π§¬
Topic Evolution
π
Keyword Champion
π
Grand Slam
π¬
Deep Specialist
(19)
π€
Dynamic Duo
(13)
π
Trend Setter
β‘
Prolific Year
(20)
π
Conference Pioneer
π
Century Club
(105)
ποΈ
Keyword Collector
(115)
π₯
Unstoppable
(10)
Conferences
INTERSPEECH (83)
AAAI (7)
ACL (6)
NIPS (4)
CVPR (2)
IJCAI (2)
EMNLP (1)
ICLR (1)
ICML (1)
NAACL (1)
RSS (1)
Top co-authors
Research topics
Keywords
voice conversion
(12)
speech recognition
(11)
automatic speech recognition
(11)
speech enhancement
(9)
speech synthesis
(9)
text-to-speech synthesis
(8)
connectionist temporal classification
(7)
deep neural network
(6)
speaker embedding
(6)
attention mechanism
(6)
language model
(6)
knowledge distillation
(5)
neural vocoder
(5)
end-to-end model
(5)
style transfer
(5)
zero-shot learning
(5)
end-to-end speech recognition
(5)
speaker verification
(4)
convolutional neural network
(4)
self-supervised learning
(4)
Papers
KALL-E: Autoregressive Speech Synthesis with Next-Distribution Prediction
AAAI 2026
LLM-ForcedAligner: A Non-Autoregressive and Accurate LLM-Based Forced Aligner for Multilingual and Long-Form Speech
ACL 2026
WenetSpeech-Yue: A Large-Scale Cantonese Speech Corpus with Multi-dimensional Annotation
AAAI 2026
Hearing More with Less: Multi-Modal Retrieval-and-Selection Augmented Conversational LLM-Based ASR
AAAI 2026
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
ACL 2025
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
RSS 2025
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
ICML 2025
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
ICLR 2025
Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
ACL 2025
PVTNL: Prompting Vision Transformers with Natural Language for Generalizable Person Re-identification
EMNLP 2025
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
CVPR 2025
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
AAAI 2025
Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
AAAI 2025
SCDNet: Self-supervised Learning Feature based Speaker Change Detection
INTERSPEECH 2024
SignGraph: A Sign Sequence is Worth Graphs of Nodes
CVPR 2024
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
INTERSPEECH 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
INTERSPEECH 2024
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
INTERSPEECH 2024
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation
INTERSPEECH 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
INTERSPEECH 2024
Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper
INTERSPEECH 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
INTERSPEECH 2024
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition
INTERSPEECH 2024
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
INTERSPEECH 2024
Text-aware and Context-aware Expressive Audiobook Speech Synthesis
INTERSPEECH 2024
BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation
INTERSPEECH 2024
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention
INTERSPEECH 2024
SEQ-former: A context-enhanced and efficient automatic speech recognition framework
INTERSPEECH 2024
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion
INTERSPEECH 2024
D-LLM: A Token Adaptive Computing Resource Allocation Strategy for Large Language Models
NIPS 2024
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection
NIPS 2024
A Diffusion-Based Framework for Multi-Class Anomaly Detection
AAAI 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
ACL 2024
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
AAAI 2023
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
INTERSPEECH 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
INTERSPEECH 2023
VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
INTERSPEECH 2023
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification
INTERSPEECH 2023
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
INTERSPEECH 2023
Two Stage Contextual Word Filtering for Context Bias in Unified Streaming and Non-streaming Transducer
INTERSPEECH 2023
Contrastive Learning for Sign Language Recognition and Translation
IJCAI 2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
INTERSPEECH 2023
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
INTERSPEECH 2023
DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting
INTERSPEECH 2023
TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition
INTERSPEECH 2023
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
INTERSPEECH 2023
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
ACL 2023
Minimizing Sequential Confusion Error in Speech Command Recognition
INTERSPEECH 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
INTERSPEECH 2022
Backend Ensemble for Speaker Verification and Spoofing Countermeasure
INTERSPEECH 2022
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
INTERSPEECH 2022
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer
INTERSPEECH 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
INTERSPEECH 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
INTERSPEECH 2022
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers
INTERSPEECH 2022
Personalized Acoustic Echo Cancellation for Full-duplex Communications
INTERSPEECH 2022
A Transformer-Based Object Detector with Coarse-Fine Crossing Representations
NIPS 2022
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset
INTERSPEECH 2022
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
INTERSPEECH 2022
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
INTERSPEECH 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
INTERSPEECH 2022
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition
INTERSPEECH 2022
DCCRN+: Channel-Wise Subband DCCRN with SNR Estimation for Speech Enhancement
INTERSPEECH 2021
Enriching Source Style Transfer in Recognition-Synthesis Based Non-Parallel Voice Conversion
INTERSPEECH 2021
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification
INTERSPEECH 2021
Improving Robustness of One-Shot Voice Conversion with Deep Discriminative Speaker Encoder
INTERSPEECH 2021
Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis
INTERSPEECH 2021
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
INTERSPEECH 2021
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain
INTERSPEECH 2021
WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit
INTERSPEECH 2021
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
INTERSPEECH 2021
Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition
INTERSPEECH 2021
Controllable Context-Aware Conversational Speech Synthesis
INTERSPEECH 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS
INTERSPEECH 2021
F-T-LSTM Based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement
INTERSPEECH 2021
Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition
INTERSPEECH 2020
NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge
INTERSPEECH 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
INTERSPEECH 2020
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
INTERSPEECH 2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
INTERSPEECH 2020
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals
NIPS 2020
Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation
IJCAI 2020
AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification
INTERSPEECH 2020
Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music
INTERSPEECH 2020
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training
INTERSPEECH 2020
An End-to-End Architecture of Online Multi-Channel Speech Separation
INTERSPEECH 2020
Wake Word Detection with Alignment-Free Lattice-Free MMI
INTERSPEECH 2020
Improved Speaker-Dependent Separation for CHiME-5 Challenge
INTERSPEECH 2019
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
INTERSPEECH 2019
Adversarial Regularization for End-to-End Robust Speaker Verification
INTERSPEECH 2019
Towards Language-Universal Mandarin-English Speech Recognition
INTERSPEECH 2019
Building a Mixed-Lingual Neural TTS System with Only Monolingual Data
INTERSPEECH 2019
A New GAN-Based End-to-End TTS Training Algorithm
INTERSPEECH 2019
Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition
INTERSPEECH 2019
Training Augmentation with Adversarial Examples for Robust Speech Recognition
INTERSPEECH 2018
Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model
INTERSPEECH 2018
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
INTERSPEECH 2018
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition
INTERSPEECH 2018
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search
INTERSPEECH 2018
Attention-based End-to-End Models for Small-Footprint Keyword Spotting
INTERSPEECH 2018
Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling
INTERSPEECH 2017
Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion
INTERSPEECH 2017
Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis
INTERSPEECH 2016
Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion
INTERSPEECH 2016
A DNN-HMM Approach to Story Segmentation
INTERSPEECH 2016
Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection
INTERSPEECH 2016
Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information
INTERSPEECH 2016
Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
ACL 2013
Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News
NAACL 2007