conftrace_
2016 INTERSPEECH INTERSPEECH 2016

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks

Abstract

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

πŸš€ Conference Pioneer - INTERSPEECH 2016
πŸŒ‰ Interdisciplinary Bridge - Machine Learning and Speech & Audio
🧭 Keyword Pioneer - pronunciation generation
🐣 Hot Topic Early Bird - semi-supervised learning
🐝 Cross-Pollinator - Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio