Papers
Multiple Instance Learning for Inference of Child Attachment From Paralinguistic Aspects of Speech
Areej Buker, Huda Alsofyani, Alessandro Vinciarelli
Multi-resolution Approach to Identification of Spoken Languages and To Improve Overall Language Diarization System Using Whisper Model
Bhavik Vachhani, Dipesh Singh, Rustom Lawyer
Multi-Scale Attention for Audio Question Answering
Guangyao Li, Yixin Xu, Di Hu
Multi-Scale Temporal Transformer For Speech Emotion Recognition
Zhipeng Li, Xiaofen Xing, Yuanbo Fang et al.
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Marc Delcroix, Naohiro Tawara, Mireia Diez et al.
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition
Belen Alastruey, Lukas Drude, Jahn Heymann et al.
Mutual Information-based Embedding Decoupling for Generalizable Speaker Verification
Jianchen Li, Jiqing Han, Shiwen Deng et al.
MyVoice: Arabic Speech Resource Collaboration Platform
Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury et al.
My Vowels Matter: Formant Automation Tools for Diverse Child Speech
Hannah Valentine, Joel MacAuslan, Maria Grigos et al.
Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS
Tankala Pavan Kalyan, Preeti Rao, Preethi Jyothi et al.
Nasal vowel production and grammatical processing in French-speaking children with cochlear implants and normal-hearing peers.
Sophie Fagniart, Véronique Delvaux, Brigitte Charlier et al.
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Rao Ma, Mark J. F. Gales, Kate M. Knill et al.
NEMA: An Ecologically Valid Tool for Assessing Hearing Devices, Advanced Algorithms, and Communication in Diverse Listening Environments
Nicky Chong-White, Arun Sebastian, Jorge Mejia
NeMo Forced Aligner and its application to word alignment for subtitle generation
Elena Rastorgueva, Vitaly Lavrukhin, Boris Ginsburg
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition
Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang et al.
Neural Speech Synthesis with Enriched Phrase Boundaries
Marie Kunešová, Jindřich Matoušek
Nkululeko: Machine Learning Experiments on Speaker Characteristics Without Programming
Felix Burkhardt, Florian Eyben, Björn W. Schuller
Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews
Sergio Burdisso, Esaú Villatoro-Tello, Srikanth Madikeri et al.
Noise-Robust Bandwidth Expansion for 8K Speech Recordings
Yin-Tse Lin, Bo-Hao Su, Chi-Han Lin et al.
Nonbinary American English speakers encode gender in vowel acoustics
Maxwell Hope, Charlotte Ward, Jason Lilley
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals
Jinhan Wang, Vijay Ravi, Abeer Alwan
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
Kamer Ali Yuksel, Thiago Castro Ferreira, Golara Javadi et al.
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang, Songxiang Liu, Helin Wang et al.
North Sámi Dialect Identification with Self-supervised Speech Models
Sofoklis Kakouros, Katri Hiovain-Asikainen
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed