Xixin Wu

50 papers · 2018–2026 · 8 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🗺️ Taxonomy Completionist (20) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (8)

🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (20) 🧭 Keyword Pioneer 🏠 Conference Loyalist (32) 🤝 Dynamic Duo (36) 🧬 Topic Evolution 👥 Mega-Team (20) 🏆 Keyword Champion (2) 🔥 Unstoppable (8) ❓ The Questioner ⚡ Prolific Year (5) 💎 Century Club (48) 🗃️ Keyword Collector (52)

Conferences

INTERSPEECH (32) ACL (6) EMNLP (4) AAAI (2) ICML (2) NAACL (2) IJCNLP (1) NIPS (1)

Top co-authors

Helen Meng (37) Xunying Liu (16) Zhiyong Wu (9) Helen M. Meng (8) Tianhua Zhang (8) Dongchao Yang (7) Hongyin Luo (7) Songxiang Liu (6) Haohan Guo (6) Kun Li (6)

Keywords

language model (6) large language model (5) speech recognition (5) unsupervised learning (4) automatic speech recognition (4) text-to-speech synthesis (3) speaker verification (3) speech synthesis (3) voice conversion (3) retrieval-augmented generation (2) variational inference (2) acoustic model (2) long short-term memory (2) speaker embedding (2) language modeling (2) opinion mining (2) retrieval augmented generation (2) ensemble learning (2) question answering (2) speaker diarization (2)

Papers

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models AAAI 2026 UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment ACL 2026 Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution ACL 2025 ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling ICML 2025 Autoregressive Speech Synthesis without Vector Quantization ACL 2025 Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains ACL 2025 RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning EMNLP 2025 SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models INTERSPEECH 2024 SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes AAAI 2024 Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers EMNLP 2024 UniAudio: Towards Universal Audio Generation with Large Language Models ICML 2024 Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder INTERSPEECH 2024 Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models INTERSPEECH 2024 Prompting Large Language Models with Mispronunciation Detection and Diagnosis Abilities INTERSPEECH 2024 CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction INTERSPEECH 2024 UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner NIPS 2024 Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System INTERSPEECH 2024 Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models INTERSPEECH 2024 Rethinking Machine Ethics – Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? NAACL 2024 Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning NAACL 2024 Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders INTERSPEECH 2023 Search Augmented Instruction Learning EMNLP 2023 ConvRGX: Recognition, Generation, and Extraction for Self-trained Conversational Question Answering ACL 2023 Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator INTERSPEECH 2023 PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts INTERSPEECH 2023 Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout ACL 2022 Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information INTERSPEECH 2022 A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS INTERSPEECH 2022 A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS INTERSPEECH 2022 Exploring linguistic feature and model combination for speech recognition based automatic AD detection INTERSPEECH 2022 Spoofing-Aware Speaker Verification by Multi-Level Fusion INTERSPEECH 2022 Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion INTERSPEECH 2021 Deliberation-Based Multi-Pass Speech Synthesis INTERSPEECH 2021 VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis INTERSPEECH 2021 Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks INTERSPEECH 2021 Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification INTERSPEECH 2020 Non-Native Children’s Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems INTERSPEECH 2020 Speaker-Aware Linear Discriminant Analysis in Speaker Verification INTERSPEECH 2020 Ensemble Approaches for Uncertainty in Spoken Language Assessment INTERSPEECH 2020 Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT INTERSPEECH 2019 Unsupervised Methods for Audio Classification from Lecture Discussion Recordings INTERSPEECH 2019 Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models INTERSPEECH 2019 Coupling Global and Local Context for Unsupervised Aspect Extraction EMNLP 2019 Coupling Global and Local Context for Unsupervised Aspect Extraction IJCNLP 2019 Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams INTERSPEECH 2019 LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition INTERSPEECH 2019 Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus INTERSPEECH 2018 Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis INTERSPEECH 2018 Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance INTERSPEECH 2018 Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis INTERSPEECH 2018