Papers
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Yochai Blau, Rohan Agrawal, Lior Madmony et al.
Utility-Preserving Privacy-Enabled Speech Embeddings for Emotion Detection
Chandrashekhar Lavania, Sanjiv Das, Xin Huang et al.
Validation of a Task-Independent Cepstral Peak Prominence Measure with Voice Activity Detection
Olivia M. Murton, Abigail E. Haenssler, Marc F. Maffei et al.
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
Zilu Guo, Jun Du, Chin-Hui Lee et al.
Variational Classifier for Unsupervised Anomalous Sound Detection under Domain Generalization
Antonio Almudévar, Alfonso Ortega, Luis Vicente et al.
VC-T: Streaming Voice Conversion Based on Neural Transducer
Hiroki Kanagawa, Takafumi Moriya, Yusuke Ijima
Verbal and nonverbal feedback signals in response to increasing levels of miscommunication
Maeva Garnier, Eric Le Ferrand, Fabien Ringeval
Video Multimodal Emotion Recognition System for Real World Applications
Sun-Kyung Lee, Jong-Hwan Kim
Video Summarization Leveraging Multimodal Information for Presentations
Hanchao Liu, Dapeng Chen, Rongjun Li et al.
Vietnam-Celeb: a large-scale dataset for Vietnamese speaker recognition
Viet Thanh Pham, Xuan Thai Hoa Nguyen, Vu Hoang et al.
VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Yongmao Zhang, Heyang Xue, Hanzhao Li et al.
Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR
Kaushal Bhogale, Sai Sundaresan, Abhigyan Raman et al.
Visualizing Data Augmentation in Deep Speaker Recognition
Pengqi Li, Lantian Li, Askar Hamdulla et al.
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Xubo Liu, Qiushi Huang, Xinhao Mei et al.
Visually grounded few-shot word acquisition with fewer shots
Leanne Nortje, Benjamin van Niekerk, Herman Kamper
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong, Jihoon Park, Beomjeong Kim et al.
Vocoder drift in x-vector–based speaker anonymization
Michele Panariello, Massimiliano Todisco, Nicholas Evans
Voice Conversion With Just Nearest Neighbors
Matthew Baas, Benjamin van Niekerk, Herman Kamper
Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition
David Doukhan, Simon Devauchelle, Lucile Girard-Monneron et al.
Voice Twins: Discovering Extremely Similar-sounding, Unrelated Speakers
Linda Gerlach, Kirsty McDougall, Finnian Kelly et al.
Vowel Normalisation in Latent Space for Sociolinguistics
James Burridge
Vowel reduction by Greek-speaking children: The effect of stress and word length
Polychronia Christodoulidou, Katerina Nicolaidis, Dimitrios Stamovlasis
VoxTube: a multilingual speaker recognition dataset
Ivan Yakovlev, Anton Okhotnikov, Nikita Torgashov et al.
Wav2ToBI: a new approach to automatic ToBI transcription
Wanyue Zhai, Mark Hasegawa-Johnson
wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting
Ranzo Huang, Brian Mak