Papers
Variability in Production of Non-Sibilant Fricative [ç] in /hi/
Tsukasa Yoshinaga, Kikuo Maekawa, Akiyoshi Iida
Variations of multi-task learning for spoken language assessment
Jeremy Heng Meng Wong, Huayun Zhang, Nancy Chen
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network
Junjie Li, Meng Ge, Zexu Pan et al.
Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement
Tuan Vu Ho, Quoc Huy Nguyen, Masato Akagi et al.
Vietnamese Capitalization and Punctuation Recovery Models
Hoang Thi Thu Uyen, Nguyen Anh Tu, Ta Duc Huy
View-Specific Assessment of L2 Spoken English
Stefano Bannò, Bhanu Balusu, Mark Gales et al.
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong, Minsu Kim, Daehun Yoo et al.
Visualising Model Training via Vowel Space for Text-To-Speech Systems
Binu Nisal Abeysinghe, Jesin James, Catherine Watson et al.
Visually-aware Acoustic Event Detection using Heterogeneous Graphs
AMIR SHIRIAN, Krishna Somandepalli, Victor Sanchez et al.
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise
Tuomo Raitio, Petko Petkov, Jiangchuan Li et al.
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
Venkatesh Shenoy Kadandale, Juan F. Montesinos, Gloria Haro
Vocal-Tract Area Functions with Articulatory Reality for Tract Opening
Zhao Zhang, Ju Zhang, Jianguo Wei et al.
Voice Activity Projection: Self-supervised Learning of Turn-taking Events
Erik Ekstedt, Gabriel Skantze
Voice Conversion Can Improve ASR in Very Low-Resource Settings
Matthew Baas, Herman Kamper
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration
Haohe Liu, Xubo Liu, Qiuqiang Kong et al.
VoiceMe: Personalized voice generation in TTS
Pol van Rijn, Silvan Mertes, Dominik Schiller et al.
Voice Puppetry with FastPitch
Emelie Van De Vreken, Korin Richmond, Catherine Lai
Voicing decision based on phonemes classification and spectral moments for whisper-to-speech conversion
Luc Ardaillon, Nathalie Henrich, Olivier Perrotin
Voicing neutralization in Romanian fricatives across different speech styles
Laura Spinu, Ioana Vasilescu, Lori Lamel et al.
VOT and F0 perturbations for the realization of voicing contrast in Tohoku Japanese
Hiroto Noguchi, Sanae Matsui, Naoya Watabe et al.
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Jiatong Shi, George Saon, David Haws et al.
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
Chenpeng Du, Yiwei Guo, Xie Chen et al.
W2V2-Light: A Lightweight Version of Wav2vec 2.0 for Automatic Speech Recognition
Dong-Hyun Kim, Jae-Hong Lee, Ji-Hwan Mo et al.