Papers
8,761 papers found
Motion Based Audio-Visual Segmentation
Jiahao Li, Miao Liu, Shu Yang et al.
MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms
Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo et al.
MSA-DPCRN: A Multi-Scale Asymmetric Dual-Path Convolution Recurrent Network with Attentional Feature Fusion for Acoustic Echo Cancellation
Ye Ni, Cong Pang, Chengwei Huang et al.
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
Qian Yang, Jialong Zuo, Zhe Su et al.
MSDET: Multitask Speaker Separation and Direction-of-Arrival Estimation Training
Roland Hartanto, Sakriani Sakti, Koichi Shinoda
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Hemant Yadav, Sunayana Sitaram, Rajiv Ratn Shah
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Song Li, Yongbin You, Xuezhi Wang et al.
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma et al.
Multi-Channel Extension of Pre-trained Models for Speaker Verification
Ladislav Mošner, Romain Serizel, Lukáš Burget et al.
Multi-Channel Multi-Speaker ASR Using Target Speaker’s Solo Segment
Yiwen Shao, Shi-Xiong Zhang, Yong Xu et al.
MULTI-CONVFORMER: Extending Conformer with Multiple Convolution Kernels
Darshan Prabhu, Yifan Peng, Preethi Jyothi et al.
Multi-label Bird Species Classification from Field Recordings using Mel_Graph-GCN Framework
Noumida A, Rajeev Rajan
Multi-latency look-ahead for streaming speaker segmentation
Bilal Rahou, Hervé Bredin
Multilingual Speech and Language Analysis for the Assessment of Mild Cognitive Impairment: Outcomes from the Taukadial Challenge
Paula Andrea Pérez-Toro, Tomas Arias-Vergara, Philipp Klumpp et al.
Multi-mic Echo Cancellation Coalesced with Beamforming for Real World Adverse Acoustic Conditions
Premanand Nayak, Kamini Sabu, M. Ali Basha Shaik
Multi-modal Adversarial Training for Zero-Shot Voice Cloning
John Janiczek, Dading Chong, Dongyang Dai et al.
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of Speech-Silence and Word-Punctuation
Jinzuomu Zhong, Yang Li, Hui Huang et al.
Multimodal Belief Prediction
John Murzaku, Adil Soubki, Owen Rambow
Multimodal Continuous Fingerspelling Recognition via Visual Alignment Learning
Katerina Papadimitriou, Gerasimos Potamianos
Multimodal Digital Biomarkers for Longitudinal Tracking of Speech Impairment Severity in ALS: An Investigation of Clinically Important Differences
Michael Neumann, Hardik Kothare, Jackson Liscombe et al.
Multimodal Fusion for Vocal Biomarkers Using Vector Cross-Attention
Vladimir Despotovic, Abir Elbéji, Petr V. Nazarov et al.
Multimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition
Xiaohan Shi, Xingfeng Li, Tomoki Toda
Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Shruti Palaskar, Ognjen Rudovic, Sameer Dharur et al.
Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation
Tsun-An Hsieh, Heeyoul Choi, Minje Kim