Papers
8,761 papers found
Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models
Victor Miara, Theo Lepage, Reda Dehak
To what extent can ASV systems naturally defend against spoofing attacks?
Jee-weon Jung, Xin Wang, Nicholas Evans et al.
TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking
Junzuo Zhou, Jiangyan Yi, Tao Wang et al.
Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment
Rotem Rousso, Eyal Cohen, Joseph Keshet et al.
Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
Wing-Zin Leung, Mattias Cross, Anton Ragni et al.
Training speech-breathing coordination in computer-assisted reading
Delphine Charuau, Andrea Briglia, Erika Godde et al.
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
William Ravenscroft, George Close, Stefan Goetze et al.
Transfer Learning from Whisper for Microscopic Intelligibility Prediction
Paul Best, Santiago Cuervo, Ricard Marxer
Transformer-based Model for ASR N-Best Rescoring and Rewriting
Iwen E Kang, Christophe Van Gysel, Man-Hung Siu
Translating speech with just images
Dan Oneata, Herman Kamper
Translingual Language Markers for Cognitive Assessment from Spontaneous Speech
Bao Hoang, Yijiang Pang, Hiroko Dodge et al.
Transmitted and Aggregated Self-Attention for Automatic Speech Recognition
Tian-Hao Zhang, Xinyuan Qian, Feng Chen et al.
TSE-PI: Target Sound Extraction under Reverberant Environments with Pitch Information
Yiwen Wang, Xihong Wu
TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech
Donghyun Seong, Hoyoung Lee, Joon-Hyuk Chang
Uh, um and mh: Are filled pauses prone to conversational converge?
Mathilde Hutin, Junfei Hu, Liesbeth Degand
Uncertainty-Aware Mean Opinion Score Prediction
Hui Wang, Shiwan Zhao, Jiaming Zhou et al.
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Chun-Yi Kuan, Wei-Ping Huang, Hung-yi Lee
Understanding “understanding”: presenting a richly annotated multimodal corpus of dyadic interaction
Leonie Schade, Nico Dallmann, Olcay Tük et al.
Unified Audio Visual Cues for Target Speaker Extraction
Tianci Wu, Shulin He, Jiahui Pan et al.
Unified Framework for Spoken Language Understanding and Summarization in Task-Based Human Dialog processing
Eunice Akani, Frederic Bechet, Benoît Favre et al.
Unified Multi-Talker ASR with and without Target-speaker Enrollment
Ryo Masumura, Naoki Makishima, Tomohiro Tanaka et al.
UNIQUE : Unsupervised Network for Integrated Speech Quality Evaluation
Juhwan Yoon, WooSeok Ko, Seyun Um et al.
Universal Score-based Speech Enhancement with High Content Preservation
Robin Scheibler, Yusuke Fujita, Yuma Shirahata et al.
Unmasking Neural Codecs: Forensic Identification of AI-compressed Speech
Denise Moussa, Sandra Bergmann, Christian Riess
Unsupervised Domain Adaptation for Speech Emotion Recognition using K-Nearest Neighbors Voice Conversion
Pravin Mote, Berrak Sisman, Carlos Busso