Papers
8,761 papers found
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining
Jinlong Xue, Yayue Deng, Yingming Gao et al.
Revealing Confounding Biases: A Novel Benchmarking Approach for Aggregate-Level Performance Metrics in Health Assessments
Stefano Goria, Roseline Polle, Salvatore Fara et al.
Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
Xin Wang, Tomi Kinnunen, Kong Aik Lee et al.
Revisiting Convolution-free Transformer for Speech Recognition
Zejiang Hou, Goeric Huybrechts, Anshu Bhatia et al.
Revisiting Pitch Jumps: F0 Ratio in Seoul Korean
Michaela Watkins, Paul Boersma, Silke Hamann
RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification
Jacob Bitterman, Daniel Levi, Hilel Hagai Diamandi et al.
Rich speech signal: exploring and exploiting end-to-end automatic speech recognizers’ ability to model hesitation phenomena
Vincenzo Norman Vitale, Loredana Schettino, Francesco Cutugno
RIR-in-a-Box: Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation
Liam Kelley, Diego Di Carlo, Aditya Arie Nugraha et al.
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios
Yiwen Shao, Shi-Xiong Zhang, Dong Yu
ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2vec2.0 Based ASR
Vishwanath Pratap Singh, Federico Malato, Ville Hautamäki et al.
Robust Laughter Segmentation with Automatic Diverse Data Synthesis
Taisei Omine, Kenta Akita, Reiji Tsuruno
Robust spread spectrum speech watermarking using linear prediction and deep spectral shaping
David Looney, Nikolay D. Gaubitch
RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement
Honglie Chen, Rodrigo Mira, Stavros Petridis et al.
RW-VoiceShield: Raw Waveform-based Adversarial Attack on One-shot Voice Conversion
Ching-Yu Yang, Shreya G. Upadhyay, Ya-Tse Wu et al.
SALSA: Speedy ASR-LLM Synchronous Aggregation
Ashish Mittal, Darshan Prabhu, Sunita Sarawagi et al.
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Qiuming Zhao, Guangzhi Sun, Chao Zhang et al.
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace, Soham Ray, Kwangyoun Kim et al.
SAMSEMO: New dataset for multilingual and multimodal emotion recognition
Pawel Bujnowski, Bartlomiej Kuzma, Bartlomiej Paziewski et al.
SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis
Osamu Take, Shinnosuke Takamichi, Kentaro Seki et al.
SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
Jingru Lin, Meng Ge, Junyi Ao et al.
Scaling up masked audio encoder learning for general audio classification
Heinrich Dinkel, Zhiyong Yan, Yongqing Wang et al.
SCDNet: Self-supervised Learning Feature based Speaker Change Detection
Yue Li, Xinsheng Wang, Li Zhang et al.
Schrödinger Bridge for Generative Speech Enhancement
Ante Jukić, Roman Korostik, Jagadeesh Balam et al.
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR
Shuaishuai Ye, Shunfei Chen, Xinhui Hu et al.
SDAEC: Signal Decoupling for Advancing Acoustic Echo Cancellation
Fei Zhao, Jinjiang Liu, Xueliang Zhang