Papers
Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification
Spandan Dey, Premjeet Singh, Goutam Saha
Wave to Syntax: Probing spoken language models for syntax
Gaofei Shen, Afra Alishahi, Arianna Bisazza et al.
Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling
Theodoros Kouzelis, Georgios Paraskevopoulos, Athanasios Katsamanis et al.
Weakly supervised glottis segmentation in high-speed videoendoscopy using bounding box labels
Varun Belagali, Achuth Rao, Prasanta Kumar Ghosh
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Wangyou Zhang, Yanmin Qian
Weighted Von Mises Distribution-based Loss Function for Real-time STFT Phase Reconstruction Using DNN
Nguyen Binh Thien, Yukoh Wakabayashi, Yuting Geng et al.
What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation
Yuki Kitagishi, Naohiro Tawara, Atsunori Ogawa et al.
What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
Mu Yang, Ram C. M. C. Shekar, Okim Kang et al.
What do self-supervised speech representations encode? An analysis of languages, varieties, speaking styles and speakers
Julian Linke, Mate Kadar, Gergely Dosinszky et al.
What influences the foreign accent strength? Phonological and grammatical errors in the perception of accentedness
Sarah Wesołek, Piotr Gulgowski, Joanna Błaszczak et al.
What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions
Hanyu Meng, Vidhyasaharan Sethu, Eliathamby Ambikairajah
What questions are my customers asking?: Towards Actionable Insights from Customer Questions in Contact Center Calls
Varun Nathan, Devashish Deshpande, Ayush Kumar et al.
When Words Speak Just as Loudly as Actions: Virtual Agent Based Remote Health Assessment Integrating What Patients Say with What They Do
Vikram Ramanarayanan, David Pautler, Lakshmi Arbatti et al.
Which aspects of motor speech disorder are captured by Mel Frequency Cepstral Coefficients? Evidence from the change in STN-DBS conditions in Parkinson’s disease
Vojtěch Illner, Petr Krýže, Jan Švihlík et al.
WhiSLU: End-to-End Spoken Language Understanding with Whisper
Minghan Wang, Yinglu Li, Jiaxin Guo et al.
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Yuan Gong, Sameer Khurana, Leonid Karlinsky et al.
Whisper Encoder features for Infant Cry Classification
Monil Charola, Aastha Kachhi, Hemant A. Patil
Whisper Features for Dysarthric Severity-Level Classification
Siddharth Rathod, Monil Charola, Akshat Vora et al.
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain, Jaesung Huh, Tengda Han et al.
Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously
Cheng-Han Chiang, Wei-Ping Huang, Hung-yi Lee
Word-level Confidence Estimation for CTC Models
Burin Naowarat, Thananchai Kongthaworn, Ekapol Chuangsuwanich
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network
Wang Chunhui, Chang Zeng, Xing He
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen