Xie Chen

50 papers · 2016–2026 · 8 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (17) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (8)

🗺️ Taxonomy Completionist (17) 🧭 Keyword Pioneer 🏃 Academic Marathon (9) 🏠 Conference Loyalist (22) 🔬 Deep Specialist (12) 🧬 Topic Evolution 🏆 Keyword Champion (4) 🤝 Dynamic Duo (22) 🚀 Conference Pioneer ⚡ Prolific Year (14) 💎 Century Club (42) 🗃️ Keyword Collector (54) 🔥 Unstoppable (5)

Conferences

INTERSPEECH (22) ACL (13) AAAI (7) EMNLP (3) ICML (2) ICCV (1) IJCAI (1) NAACL (1)

Top co-authors

Ziyang Ma (26) Kai Yu (13) Wenxi Chen (10) Yifan Yang (9) Xiquan Li (7) Zhikang Niu (7) Zhisheng Zheng (7) Guanrou Yang (6) Chenpeng Du (6) Yiwei Guo (5)

Keywords

automatic speech recognition (11) speech synthesis (6) self-supervised learning (6) speech recognition (5) text-to-speech synthesis (5) large language model (5) vector quantization (4) low-resource language (3) speech language model (3) end-to-end model (3) multi-task learning (3) flow matching (3) contrastive learning (3) multimodal learning (3) language model (3) neural transducer (3) representation learning (2) diffusion model (2) zero-shot learning (2) domain adaptation (2)

Papers

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows ACL 2026 Less Languages, Less Tokens: An Efficient Unified Logic Cross-lingual Chain-of-Thought Reasoning Framework ACL 2026 FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining ACL 2026 Evaluating the Expressive Appropriateness of Speech in Rich Contexts ACL 2026 Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training ACL 2026 SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization ACL 2026 WaveEx: Accelerating Flow Matching-based Speech Generation via Wavelet-guided Extrapolation AAAI 2026 AHAMask: Reliable Task Specification for Large Audio Language Models Without Instructions AAAI 2026 SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation ACL 2025 Towards Reliable Large Audio Language Model ACL 2025 SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training ACL 2025 Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation EMNLP 2025 URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models EMNLP 2025 Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video ICCV 2025 MUZO: Leveraging Multiple Queries and Momentum for Zeroth-Order Fine-Tuning of Large Language Models EMNLP 2025 VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization AAAI 2025 Language Model Can Listen While Speaking AAAI 2025 Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration AAAI 2025 ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering AAAI 2025 GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement ACL 2025 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching ACL 2025 Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning ACL 2025 emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation ACL 2024 AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection INTERSPEECH 2024 Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer INTERSPEECH 2024 Improved Factorized Neural Transducer Model For Text-only Domain Adaptation INTERSPEECH 2024 EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark INTERSPEECH 2024 MaLa-ASR: Multimedia-Assisted LLM-Based ASR INTERSPEECH 2024 The Interspeech 2024 Challenge on Speech Processing Using Discrete Units INTERSPEECH 2024 LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR INTERSPEECH 2024 On the Effectiveness of Acoustic BPE in Decoder-Only TTS INTERSPEECH 2024 TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers INTERSPEECH 2024 EAT: Self-Supervised Pre-Training with Efficient Audio Transformer IJCAI 2024 BAT: Learning to Reason about Spatial Sounds with Large Language Models ICML 2024 UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding AAAI 2024 Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems INTERSPEECH 2023 Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition INTERSPEECH 2023 Blank-regularized CTC for Frame Skipping in Neural Transducer INTERSPEECH 2023 Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation INTERSPEECH 2023 Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation INTERSPEECH 2023 MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets INTERSPEECH 2023 DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech INTERSPEECH 2023 VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature INTERSPEECH 2022 Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition INTERSPEECH 2022 Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition INTERSPEECH 2021 Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS INTERSPEECH 2021 Memory-Efficient Pipeline-Parallel DNN Training ICML 2021 The Effect of Adding Authorship Knowledge in Automated Text Scoring NAACL 2018 Active Memory Networks for Language Modeling INTERSPEECH 2018 Multi-Language Neural Network Language Models INTERSPEECH 2016