Xie Chen
50 papers · 2016–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (17) π Renaissance Researcher (6) π Interdisciplinary Bridge π Conference Polyglot (8)
πΊοΈ
Taxonomy Completionist
(17)
π§
Keyword Pioneer
π
Academic Marathon
(9)
π
Conference Loyalist
(22)
π¬
Deep Specialist
(12)
π§¬
Topic Evolution
π
Keyword Champion
(4)
π€
Dynamic Duo
(22)
π
Conference Pioneer
β‘
Prolific Year
(14)
π
Century Club
(42)
ποΈ
Keyword Collector
(54)
π₯
Unstoppable
(5)
Conferences
INTERSPEECH (22)
ACL (13)
AAAI (7)
EMNLP (3)
ICML (2)
ICCV (1)
IJCAI (1)
NAACL (1)
Top co-authors
Keywords
automatic speech recognition
(11)
speech synthesis
(6)
self-supervised learning
(6)
speech recognition
(5)
text-to-speech synthesis
(5)
large language model
(5)
vector quantization
(4)
low-resource language
(3)
speech language model
(3)
end-to-end model
(3)
multi-task learning
(3)
flow matching
(3)
contrastive learning
(3)
multimodal learning
(3)
language model
(3)
neural transducer
(3)
representation learning
(2)
diffusion model
(2)
zero-shot learning
(2)
domain adaptation
(2)
Papers
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
ACL 2026
Less Languages, Less Tokens: An Efficient Unified Logic Cross-lingual Chain-of-Thought Reasoning Framework
ACL 2026
FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining
ACL 2026
Evaluating the Expressive Appropriateness of Speech in Rich Contexts
ACL 2026
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
ACL 2026
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
ACL 2026
WaveEx: Accelerating Flow Matching-based Speech Generation via Wavelet-guided Extrapolation
AAAI 2026
AHAMask: Reliable Task Specification for Large Audio Language Models Without Instructions
AAAI 2026
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
ACL 2025
Towards Reliable Large Audio Language Model
ACL 2025
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
ACL 2025
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
EMNLP 2025
URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
EMNLP 2025
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video
ICCV 2025
MUZO: Leveraging Multiple Queries and Momentum for Zeroth-Order Fine-Tuning of Large Language Models
EMNLP 2025
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
AAAI 2025
Language Model Can Listen While Speaking
AAAI 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
AAAI 2025
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
AAAI 2025
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
ACL 2025
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
ACL 2025
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning
ACL 2025
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
ACL 2024
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
INTERSPEECH 2024
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
INTERSPEECH 2024
Improved Factorized Neural Transducer Model For Text-only Domain Adaptation
INTERSPEECH 2024
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
INTERSPEECH 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
INTERSPEECH 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
INTERSPEECH 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
INTERSPEECH 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
INTERSPEECH 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
INTERSPEECH 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
IJCAI 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
ICML 2024
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
AAAI 2024
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
INTERSPEECH 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
INTERSPEECH 2023
Blank-regularized CTC for Frame Skipping in Neural Transducer
INTERSPEECH 2023
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation
INTERSPEECH 2023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
INTERSPEECH 2023
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
INTERSPEECH 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
INTERSPEECH 2023
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
INTERSPEECH 2022
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
INTERSPEECH 2022
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
INTERSPEECH 2021
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS
INTERSPEECH 2021
Memory-Efficient Pipeline-Parallel DNN Training
ICML 2021
The Effect of Adding Authorship Knowledge in Automated Text Scoring
NAACL 2018
Active Memory Networks for Language Modeling
INTERSPEECH 2018
Multi-Language Neural Network Language Models
INTERSPEECH 2016