conftrace_
2016 INTERSPEECH INTERSPEECH 2016

Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data

Abstract

Recently, deep neural networks (DNNs) trained to predict senones have been incorporated into the conventional i-vector based speaker verification systems to provide soft frame alignments and show promising results. However, the data mismatch problem may degrade the performance since the DNN requires transcribed data (out-domain data) while the data sets (in-domain data) used for i-vector training and extraction are mostly untranscribed. In this paper, we try to address this problem by exploiting the unlabeled in-domain data during the training of the DNN, hoping the DNN can provide a more robust basis for the in-domain data. In this work, we first explore the impact of using in-domain data during the unsupervised DNN pre-training process. In addition, we decode the in-domain data using a hybrid DNN-HMM system to get its transcription, and then we retrain the DNN model with the “labeled” in-domain data. Experimental results on the NIST SRE 2008 and the NIST SRE 2010 databases demonstrate the effectiveness of the proposed methods.

🚀 Conference Pioneer - INTERSPEECH 2016
🌉 Interdisciplinary Bridge - Artificial Intelligence and Deep Learning and Machine Learning
📈 Trend Setter - Pretraining
🧭 Keyword Pioneer - unsupervised pre-training
🐣 Hot Topic Early Bird - semi-supervised learning
🐝 Cross-Pollinator - Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio