Papers
8,761 papers found
Voice quality in telephone speech: Comparing acoustic measures between VoIP telephone and high-quality recordings
Chenzi Xu, Jessica Wormald, Paul Foulkes et al.
Voice Quality Variation in AAE: An Additional Challenge for Addressing Bias in ASR Models?
Li-Fang Lai, Nicole Holliday
VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech
Heeseung Kim, Sang-gil Lee, Jiheum Yeom et al.
VoiCor: A Residual Iterative Voice Correction Framework for Monaural Speech Enhancement
Rui Cao, Tianrui Wang, Meng Ge et al.
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
Yuke Lin, Ming Cheng, Fulin Zhang et al.
VoxFlow AI: wearable voice converter for atypical speech
Grzegorz P. Mika, Konrad Zieli´nski, Paweł Cyrta et al.
VoxMed: one-step respiratory disease classifier using digital stethoscope sounds
Paridhi Mundra, Manik Sharma, Yashwardhan Chaudhuri et al.
VoxSim: A perceptual voice similarity dataset
Junseok Ahn, Youkyum Kim, Yeunju Choi et al.
VSASV: a Vietnamese Dataset for Spoofing-Aware Speaker Verification
Vu Hoang, Viet Thanh Pham, Hoa Nguyen Xuan et al.
Wav2vec 2.0 Embeddings Are No Swiss Army Knife -- A Case Study for Multiple Sclerosis
Gábor Gosztolya, Mercedes Vetráb, Veronika Svindt et al.
Wave to Interlingua: Analyzing Representations of Multilingual Speech Transformers for Spoken Language Translation
Badr M. Abdullah, Mohammed Maqsood Shaik, Dietrich Klakow
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
Andrés Piñeiro-Martín, Carmen García-Mateo, Laura Docio-Fernandez et al.
Well, what can you do with messy data? Exploring the prosody and pragmatic function of the discourse marker "well" with found data and speech synthesis
Johannah O'Mahony, Catherine Lai, Éva Székely
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
Linhan Ma, Dake Guo, Kun Song et al.
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Shuai Wang, Ke Zhang, Shaoxiong Lin et al.
W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification
Zezhong Jin, Youzhi Tu, Man-Wai Mak
What Does it Take to Generalize SER Model Across Datasets? A Comprehensive Benchmark
Adham Ibrahim, Shady Shehata, Ajinkya Kulkarni et al.
What do people hear? Listeners’ Perception of Conversational Speech
Adaeze Adigwe, Sarenne Wallbridge, Simon King
What happens in continued pre-training? Analysis of self-supervised speech models with continued pre-training for colloquial Finnish ASR
Yaroslav Getman, Tamas Grosz, Mikko Kurimo
What if HAL breathed? Enhancing Empathy in Human-AI Interactions with Breathing Speech Synthesis
Nicolò Loddo, Francisca Pessanha, Almila Akdag
When Whisper Listens to Aphasia: Advancing Robust Post-Stroke Speech Recognition
Giulia Sanguedolce, Sophie Brook, Dragos C. Gruia et al.
WHiSER: White House Tapes Speech Emotion Recognition Corpus
Abinay Reddy Naini, Lucas Goncalves, Mary A. Kohler et al.
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Andrew Rouditchenko, Yuan Gong, Samuel Thomas et al.
Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges
Per E Kummervold, Javier de la Rosa, Freddy Wetjen et al.
Whisper Multilingual Downstream Task Tuning Using Task Vectors
Ji-Hun Kang, Jae-Hong Lee, Mun-Hak Lee et al.