Papers
Watch Me Speak: 2D Visualization of Human Mouth during Speech
C Siddarth, Sathvik Udupa, Prasanta Kumar Ghosh
WA-Transformer: Window Attention-based Transformer with Two-stage Strategy for Multi-task Audio Source Separation
Yang Wang, Chenxing Li, Feng Deng et al.
wav2vec2-based Speech Rating System for Children with Speech Sound Disorder
Yaroslav Getman, Ragheb Al-Ghezi, Katja Voskoboinik et al.
Wav2Vec-Aug: Improved self-supervised training with limited data
Anuroop Sriram, Michael Auli, Alexei Baevski
Wav2vec behind the Scenes: How end2end Models learn Phonetics
Teena tom Dieck, Paula Andrea Pérez-Toro, Tomas Arias et al.
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
Han Zhu, Li Wang, Gaofeng Cheng et al.
WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Heting Gao, Junrui Ni, Kaizhi Qian et al.
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak, Piotr Dura, Pol van Rijn et al.
Weakly-Supervised Neural Full-Rank Spatial Covariance Analysis for a Front-End System of Distant Speech Recognition
Yoshiaki Bando, Takahiro Aizawa, Katsutoshi Itoyama et al.
Weak supervision for Question Type Detection with large language models
Jiřı́ Martı́nek, Christophe Cerisara, Pavel Kral et al.
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang, Di Wu, Zhendong Peng et al.
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
Zewang Zhang, Yibin Zheng, Xinhui Li et al.
What can Speech and Language Tell us About the Working Alliance in Psychotherapy
Sebastian Peter Bayerl, Gabriel Roccabruna, Shammur Absar Chowdhury et al.
When Is TTS Augmentation Through a Pivot Language Useful?
Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu et al.
When Phonetics Meets Morphology: Intervocalic Voicing Within and Across Words in Romance Languages
Mathilde Hutin, Martine Adda-Decker, Lori Lamel et al.
Where's the uh, hesitation? The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence
Ambika Kirkland, Harm Lameris, Eva Szekely et al.
Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset
Gordon Rennie, Olga Perepelkina, Alessandro Vinciarelli
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Sanyuan Chen, Yu Wu, Chengyi Wang et al.
Why is Korean lenis stop difficult to perceive for L2 Korean learners?
Boram Lee, Naomi Yamaguchi, Cécile Fougeron
WideResNet with Joint Representation Learning and Data Augmentation for Cover Song Identification
Shichao Hu, Bin Zhang, Jinhong Lu et al.
Word Discovery in Visually Grounded, Self-Supervised Speech Models
Puyuan Peng, David Harwath
Word-wise Sparse Attention for Multimodal Sentiment Analysis
Fan Qian, Hongwei Song, Jiqing Han
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu, Changhan Wang, Andros Tjandra et al.
XTREME-S: Evaluating Cross-lingual Speech Representations
Alexis Conneau, Ankur Bapna, Yu Zhang et al.
Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
Gerasimos Chatzoudis, Manos Plitsis, Spyridoula Stamouli et al.