conftrace_
2019 INTERSPEECH INTERSPEECH 2019

Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification

Abstract

Domain or environment mismatch between training and testing, such as various noises and channels, is a major challenge for speaker verification. In this paper, a variational autoencoder (VAE) is designed to learn the patterns of speaker embeddings extracted from noisy speech segments, including i-vector and x-vector, and generate embeddings with more diversity to improve the robustness of speaker verification systems with probabilistic linear discriminant analysis (PLDA) back-end. The approach is evaluated on the standard NIST SRE 2016 dataset. Compared to manual and generative adversarial network (GAN) based augmentation approaches, the proposed VAE based augmentation achieves a slightly better performance for i-vector on Tagalog and Cantonese with EERs of 15.54% and 7.84%, and a more significant improvement for x-vector on those two languages with EERs of 11.86% and 4.20%.

🌉 Interdisciplinary Bridge - Deep Learning and Machine Learning
🧭 Keyword Pioneer - embedding augmentation
🐝 Cross-Pollinator - Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio