Yanmin Qian
63 papers · 2016–2026 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (29) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
π
Academic Marathon
(9)
π
Cross-Pollinator
(7)
π
Conference Loyalist
(56)
π¬
Deep Specialist
(15)
π§¬
Topic Evolution
π
Keyword Champion
(2)
π±
Topic Pioneer
π€
Dynamic Duo
(13)
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(10)
π₯
Unstoppable
(10)
β
The Questioner
ποΈ
Keyword Collector
(126)
π
Century Club
(61)
Conferences
INTERSPEECH (56)
NIPS (3)
ACL (2)
AAAI (1)
IJCAI (1)
Top co-authors
Keywords
speaker verification
(19)
automatic speech recognition
(9)
speaker embedding
(8)
speech recognition
(8)
speaker recognition
(7)
speech separation
(6)
embedding learning
(6)
permutation invariant training
(5)
model compression
(5)
convolutional neural network
(4)
adversarial training
(4)
domain adaptation
(4)
knowledge distillation
(4)
end-to-end model
(4)
speech enhancement
(4)
attention mechanism
(4)
self-supervised learning
(4)
multi-talker speech recognition
(4)
connectionist temporal classification
(3)
cocktail party problem
(3)
Papers
A Data-Centric Approach to Generalizable Speech Deepfake Detection
ACL 2026
USE: A Unified Model for Universal Sound Separation and Extraction
AAAI 2026
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
ACL 2025
InstructME: An Instruction Guided Music Edit Framework with Latent Diffusion Models
IJCAI 2024
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
INTERSPEECH 2024
SparseWAV: Fast and Accurate One-Shot Unstructured Pruning for Large Speech Foundation Models
INTERSPEECH 2024
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
INTERSPEECH 2024
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
INTERSPEECH 2024
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
INTERSPEECH 2024
Contextual Biasing Speech Recognition in Speech-enhanced Large Language Model
INTERSPEECH 2024
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
INTERSPEECH 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
NIPS 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
NIPS 2024
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
NIPS 2023
Adaptive Neural Network Quantization For Lightweight Speaker Verification
INTERSPEECH 2023
Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor
INTERSPEECH 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
INTERSPEECH 2023
Overlap Aware Continuous Speech Separation without Permutation Invariant Training
INTERSPEECH 2023
Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition
INTERSPEECH 2023
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
INTERSPEECH 2023
ECAPA++: Fine-grained Deep Embedding Learning for TDNN Based Speaker Verification
INTERSPEECH 2023
Reversible Neural Networks for Memory-Efficient Speaker Verification
INTERSPEECH 2023
UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR
INTERSPEECH 2023
Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition
INTERSPEECH 2023
Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory
INTERSPEECH 2023
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
INTERSPEECH 2023
Dual Path Embedding Learning for Speaker Verification with Triplet Attention
INTERSPEECH 2022
Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification
INTERSPEECH 2022
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild
INTERSPEECH 2022
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition
INTERSPEECH 2022
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction
INTERSPEECH 2022
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
INTERSPEECH 2022
Separating Long-Form Speech with Group-wise Permutation Invariant Training
INTERSPEECH 2022
Attentive Feature Fusion for Robust Speaker Verification
INTERSPEECH 2022
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design
INTERSPEECH 2022
The SJTU System for Short-Duration Speaker Verification Challenge 2021
INTERSPEECH 2021
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
INTERSPEECH 2021
Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification
INTERSPEECH 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
INTERSPEECH 2021
Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party
INTERSPEECH 2021
Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts
INTERSPEECH 2020
Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network
INTERSPEECH 2020
Multi-Modality Matters: A Performance Leap on VoxCeleb
INTERSPEECH 2020
Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation
INTERSPEECH 2020
Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection
INTERSPEECH 2020
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming
INTERSPEECH 2020
Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition
INTERSPEECH 2020
Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech
INTERSPEECH 2019
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training
INTERSPEECH 2019
Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking
INTERSPEECH 2019
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System
INTERSPEECH 2019
Joint Decoding of CTC Based Systems for Speech Recognition
INTERSPEECH 2019
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification
INTERSPEECH 2019
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction
INTERSPEECH 2019
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge
INTERSPEECH 2019
Knowledge Distillation for Sequence Model
INTERSPEECH 2018
Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
INTERSPEECH 2018
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
INTERSPEECH 2018
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
INTERSPEECH 2018
Recognizing Multi-Talker Speech with Permutation Invariant Training
INTERSPEECH 2017
What Does the Speaker Embedding Encode?
INTERSPEECH 2017
Binary Deep Neural Networks for Speech Recognition
INTERSPEECH 2017
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC
INTERSPEECH 2016