Xiaofei Wang

43 papers · 2016–2026 · 13 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (17) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (12)

🗺️ Taxonomy Completionist (17) 🧭 Keyword Pioneer 🏃 Academic Marathon (10) 🤝 Dynamic Duo (11) 🔬 Deep Specialist (10) 🧬 Topic Evolution 🏆 Keyword Champion (2) ⚡ Prolific Year (11) 🗃️ Keyword Collector (176) 💎 Century Club (40) 🔥 Unstoppable (9) 🚀 Conference Pioneer

Conferences

INTERSPEECH (18) AAAI (7) MICCAI (3) NIPS (3) CVPR (2) ECCV (2) ICML (2) ACL (1) CORL (1) EMNLP (1) IJCAI (1) JMLR (1) WACV (1)

Top co-authors

Naoyuki Kanda (11) Zhuo Chen (9) Takuya Yoshioka (9) Jinyu Li (6) Zhong Meng (6) Yashesh Gaur (6) Dongmei Wang (4) Sheng Zhao (4) Keke Tang (4) Min Tang (4)

Keywords

automatic speech recognition (6) speaker diarization (4) speech enhancement (4) speaker counting (3) adversarial attack (3) zero-shot learning (3) speech synthesis (3) speech recognition (3) speaker identification (3) serialized output training (2) end-to-end model (2) word error rate (2) acoustic model (2) medical imaging (2) reinforcement learning (2) flow matching (2) point cloud (2) attention mechanism (2) non-negative matrix factorization (2) attention-based encoder-decoder (2)

Papers

Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks AAAI 2026 Shanks: Simultaneous Hearing and Thinking for Spoken Language Models ACL 2026 Stratos: An End-to-End Distillation Pipeline for Customized LLMs Under Distributed Cloud Environments AAAI 2026 Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising WACV 2026 Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing ICML 2025 AdvGrasp: Adversarial Attacks on Robotic Grasping from a Physical Perspective IJCAI 2025 Adaptive Spatial Transcriptomics Interpolation via Cross-modal Cross-slice Modeling MICCAI 2025 Imperceptible 3D Point Cloud Attacks on Lattice-based Barycentric Coordinates AAAI 2025 ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering AAAI 2025 Audio-Aware Large Language Models as Judges for Speaking Styles EMNLP 2025 Extracting Rare Dependence Patterns via Adaptive Sample Reweighting ICML 2025 CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations NIPS 2024 An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS INTERSPEECH 2024 Total-Duration-Aware Duration Modeling for Text-to-Speech Systems INTERSPEECH 2024 TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation NIPS 2024 FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds ECCV 2024 Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection ECCV 2024 Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics MICCAI 2024 NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription INTERSPEECH 2024 TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers INTERSPEECH 2024 EasyTS: The Express Lane to Long Time Series Forecasting AAAI 2024 Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning MICCAI 2024 Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach INTERSPEECH 2023 Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation INTERSPEECH 2022 Streaming Multi-Talker ASR with Token-Level Serialized Output Training INTERSPEECH 2022 Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings INTERSPEECH 2022 End-to-End Speaker-Attributed ASR with Transformer INTERSPEECH 2021 Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback CORL 2021 Deep Multi-Task Learning for Diabetic Retinopathy Grading in Fundus Images AAAI 2021 Saliency-Guided Image Translation CVPR 2021 Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement INTERSPEECH 2021 Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone INTERSPEECH 2021 Reinforcement Learning with Latent Flow NIPS 2021 Serialized Output Training for End-to-End Overlapped Speech Recognition INTERSPEECH 2020 Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers INTERSPEECH 2020 Learning Mixed Latent Tree Models JMLR 2020 D2D-LSTM: LSTM-Based Path Prediction of Content Diffusion Tree in Device-to-Device Social Networks AAAI 2020 Attention Based Glaucoma Detection: A Large-Scale Database and CNN Model CVPR 2019 Exploring Methods for the Automatic Detection of Errors in Manual Transcription INTERSPEECH 2019 Stream Attention for Distributed Multi-Microphone Speech Recognition INTERSPEECH 2018 A DNN-HMM Approach to Non-Negative Matrix Factorization Based Speech Enhancement INTERSPEECH 2016 A Robust Dual-Microphone Speech Source Localization Algorithm for Reverberant Environments INTERSPEECH 2016 Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source Separation INTERSPEECH 2016