Xixin Wu
50 papers · 2018–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (20) π§ Keyword Pioneer π Renaissance Researcher (8) π Interdisciplinary Bridge π Conference Polyglot (8)
π£
Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(20)
π§
Keyword Pioneer
π
Conference Loyalist
(32)
π€
Dynamic Duo
(36)
π§¬
Topic Evolution
π₯
Mega-Team
(20)
π
Keyword Champion
(2)
π₯
Unstoppable
(8)
β
The Questioner
β‘
Prolific Year
(5)
π
Century Club
(48)
ποΈ
Keyword Collector
(52)
Conferences
INTERSPEECH (32)
ACL (6)
EMNLP (4)
AAAI (2)
ICML (2)
NAACL (2)
IJCNLP (1)
NIPS (1)
Top co-authors
Keywords
language model
(6)
large language model
(5)
speech recognition
(5)
unsupervised learning
(4)
automatic speech recognition
(4)
text-to-speech synthesis
(3)
speaker verification
(3)
speech synthesis
(3)
voice conversion
(3)
retrieval-augmented generation
(2)
variational inference
(2)
acoustic model
(2)
long short-term memory
(2)
speaker embedding
(2)
language modeling
(2)
opinion mining
(2)
retrieval augmented generation
(2)
ensemble learning
(2)
question answering
(2)
speaker diarization
(2)
Papers
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
AAAI 2026
UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment
ACL 2026
Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution
ACL 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ICML 2025
Autoregressive Speech Synthesis without Vector Quantization
ACL 2025
Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains
ACL 2025
RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning
EMNLP 2025
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
INTERSPEECH 2024
SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes
AAAI 2024
Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers
EMNLP 2024
UniAudio: Towards Universal Audio Generation with Large Language Models
ICML 2024
Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder
INTERSPEECH 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
INTERSPEECH 2024
Prompting Large Language Models with Mispronunciation Detection and Diagnosis Abilities
INTERSPEECH 2024
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
INTERSPEECH 2024
UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner
NIPS 2024
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System
INTERSPEECH 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
INTERSPEECH 2024
Rethinking Machine Ethics β Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
NAACL 2024
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
NAACL 2024
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders
INTERSPEECH 2023
Search Augmented Instruction Learning
EMNLP 2023
ConvRGX: Recognition, Generation, and Extraction for Self-trained Conversational Question Answering
ACL 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
INTERSPEECH 2023
PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts
INTERSPEECH 2023
Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout
ACL 2022
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
INTERSPEECH 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
INTERSPEECH 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
INTERSPEECH 2022
Exploring linguistic feature and model combination for speech recognition based automatic AD detection
INTERSPEECH 2022
Spoofing-Aware Speaker Verification by Multi-Level Fusion
INTERSPEECH 2022
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion
INTERSPEECH 2021
Deliberation-Based Multi-Pass Speech Synthesis
INTERSPEECH 2021
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis
INTERSPEECH 2021
Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks
INTERSPEECH 2021
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification
INTERSPEECH 2020
Non-Native Childrenβs Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems
INTERSPEECH 2020
Speaker-Aware Linear Discriminant Analysis in Speaker Verification
INTERSPEECH 2020
Ensemble Approaches for Uncertainty in Spoken Language Assessment
INTERSPEECH 2020
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT
INTERSPEECH 2019
Unsupervised Methods for Audio Classification from Lecture Discussion Recordings
INTERSPEECH 2019
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models
INTERSPEECH 2019
Coupling Global and Local Context for Unsupervised Aspect Extraction
EMNLP 2019
Coupling Global and Local Context for Unsupervised Aspect Extraction
IJCNLP 2019
Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams
INTERSPEECH 2019
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition
INTERSPEECH 2019
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
INTERSPEECH 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
INTERSPEECH 2018
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance
INTERSPEECH 2018
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
INTERSPEECH 2018