conftrace_

Lei Xie

109 papers · 2007–2026 · 11 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+15 more ↓

🗺️ Taxonomy Completionist (31) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11)

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11) 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (83) 🧬 Topic Evolution 🏆 Keyword Champion 🏆 Grand Slam 🔬 Deep Specialist (19) 🤝 Dynamic Duo (13) 📈 Trend Setter ⚡ Prolific Year (20) 🚀 Conference Pioneer 💎 Century Club (105) 🗃️ Keyword Collector (115) 🔥 Unstoppable (10)

Conferences

INTERSPEECH (83) AAAI (7) ACL (6) NIPS (4) CVPR (2) IJCAI (2) EMNLP (1) ICLR (1) ICML (1) NAACL (1) RSS (1)

Top co-authors

Pengcheng Guo (13) Sining Sun (10) Haizhou Li (9) Jixun Yao (9) Xiong Wang (9) Long Ma (7) Shan Yang (7) Binbin Zhang (6) Xinsheng Wang (6) Yongmao Zhang (6)

Research topics

Synthesis (1) Linguistics (1)

Keywords

voice conversion (12) speech recognition (11) automatic speech recognition (11) speech enhancement (9) speech synthesis (9) text-to-speech synthesis (8) connectionist temporal classification (7) deep neural network (6) speaker embedding (6) attention mechanism (6) language model (6) knowledge distillation (5) neural vocoder (5) end-to-end model (5) style transfer (5) zero-shot learning (5) end-to-end speech recognition (5) speaker verification (4) convolutional neural network (4) self-supervised learning (4)

Papers

KALL-E: Autoregressive Speech Synthesis with Next-Distribution Prediction AAAI 2026 LLM-ForcedAligner: A Non-Autoregressive and Accurate LLM-Based Forced Aligner for Multilingual and Long-Form Speech ACL 2026 WenetSpeech-Yue: A Large-Scale Cantonese Speech Corpus with Multi-dimensional Annotation AAAI 2026 Hearing More with Less: Multi-Modal Retrieval-and-Selection Augmented Conversational LLM-Based ASR AAAI 2026 LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement ACL 2025 Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models RSS 2025 Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM ICML 2025 GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling ICLR 2025 Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling ACL 2025 PVTNL: Prompting Vision Transformers with Natural Language for Generalizable Person Re-identification EMNLP 2025 MobileMamba: Lightweight Multi-Receptive Visual Mamba Network CVPR 2025 StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching AAAI 2025 Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation AAAI 2025 SCDNet: Self-supervised Learning Feature based Speaker Change Detection INTERSPEECH 2024 SignGraph: A Sign Sequence is Worth Graphs of Nodes CVPR 2024 AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection INTERSPEECH 2024 Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study INTERSPEECH 2024 FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter INTERSPEECH 2024 Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation INTERSPEECH 2024 Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy INTERSPEECH 2024 Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper INTERSPEECH 2024 Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling INTERSPEECH 2024 A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition INTERSPEECH 2024 WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark INTERSPEECH 2024 Text-aware and Context-aware Expressive Audiobook Speech Synthesis INTERSPEECH 2024 BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation INTERSPEECH 2024 RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention INTERSPEECH 2024 SEQ-former: A context-enhanced and efficient automatic speech recognition framework INTERSPEECH 2024 DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion INTERSPEECH 2024 D-LLM: A Token Adaptive Computing Resource Allocation Strategy for Large Language Models NIPS 2024 MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection NIPS 2024 A Diffusion-Based Framework for Multi-Class Anomaly Detection AAAI 2024 StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion ACL 2024 UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis AAAI 2023 Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network INTERSPEECH 2023 PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions INTERSPEECH 2023 VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer INTERSPEECH 2023 Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification INTERSPEECH 2023 BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR INTERSPEECH 2023 Two Stage Contextual Word Filtering for Context Bias in Unified Streaming and Non-streaming Transducer INTERSPEECH 2023 Contrastive Learning for Sign Language Recognition and Translation IJCAI 2023 DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding INTERSPEECH 2023 Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition INTERSPEECH 2023 DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting INTERSPEECH 2023 TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition INTERSPEECH 2023 StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation INTERSPEECH 2023 The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task ACL 2023 Minimizing Sequential Confusion Error in Speech Command Recognition INTERSPEECH 2022 Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis INTERSPEECH 2022 Backend Ensemble for Speaker Verification and Spoofing Countermeasure INTERSPEECH 2022 A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings INTERSPEECH 2022 CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer INTERSPEECH 2022 WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit INTERSPEECH 2022 Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion INTERSPEECH 2022 Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers INTERSPEECH 2022 Personalized Acoustic Echo Cancellation for Full-duplex Communications INTERSPEECH 2022 A Transformer-Based Object Detector with Coarse-Fine Crossing Representations NIPS 2022 Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset INTERSPEECH 2022 Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR INTERSPEECH 2022 Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher INTERSPEECH 2022 Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis INTERSPEECH 2022 Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition INTERSPEECH 2022 DCCRN+: Channel-Wise Subband DCCRN with SNR Estimation for Speech Enhancement INTERSPEECH 2021 Enriching Source Style Transfer in Recognition-Synthesis Based Non-Parallel Voice Conversion INTERSPEECH 2021 Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification INTERSPEECH 2021 Improving Robustness of One-Shot Voice Conversion with Deep Discriminative Speaker Encoder INTERSPEECH 2021 Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis INTERSPEECH 2021 AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario INTERSPEECH 2021 Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain INTERSPEECH 2021 WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit INTERSPEECH 2021 Auto-KWS 2021 Challenge: Task, Datasets, and Baselines INTERSPEECH 2021 Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition INTERSPEECH 2021 Controllable Context-Aware Conversational Speech Synthesis INTERSPEECH 2021 Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS INTERSPEECH 2021 F-T-LSTM Based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement INTERSPEECH 2021 Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition INTERSPEECH 2020 NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge INTERSPEECH 2020 Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis INTERSPEECH 2020 DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement INTERSPEECH 2020 Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition INTERSPEECH 2020 Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals NIPS 2020 Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation IJCAI 2020 AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification INTERSPEECH 2020 Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music INTERSPEECH 2020 Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training INTERSPEECH 2020 An End-to-End Architecture of Online Multi-Channel Speech Separation INTERSPEECH 2020 Wake Word Detection with Alignment-Free Lattice-Free MMI INTERSPEECH 2020 Improved Speaker-Dependent Separation for CHiME-5 Challenge INTERSPEECH 2019 Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS INTERSPEECH 2019 Adversarial Regularization for End-to-End Robust Speaker Verification INTERSPEECH 2019 Towards Language-Universal Mandarin-English Speech Recognition INTERSPEECH 2019 Building a Mixed-Lingual Neural TTS System with Only Monolingual Data INTERSPEECH 2019 A New GAN-Based End-to-End TTS Training Algorithm INTERSPEECH 2019 Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition INTERSPEECH 2019 Training Augmentation with Adversarial Examples for Robust Speech Recognition INTERSPEECH 2018 Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model INTERSPEECH 2018 Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition INTERSPEECH 2018 Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition INTERSPEECH 2018 Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search INTERSPEECH 2018 Attention-based End-to-End Models for Small-Footprint Keyword Spotting INTERSPEECH 2018 Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling INTERSPEECH 2017 Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion INTERSPEECH 2017 Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis INTERSPEECH 2016 Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion INTERSPEECH 2016 A DNN-HMM Approach to Story Segmentation INTERSPEECH 2016 Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection INTERSPEECH 2016 Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information INTERSPEECH 2016 Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions ACL 2013 Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News NAACL 2007