Papers
8,761 papers found
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Pengfei Cai, Yan Song, Kang Li et al.
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features
Katharina Anderer, Andreas Reich, Matthias Wölfel
Measurement and simulation of pressure losses due to airflow in vocal tract models
Peter Birkholz, Patrick Häsner
Measuring acoustic dissimilarity of hierarchical markers in task-oriented dialogue with MFCC-based dynamic time warping
Natalia Morozova, Guanghao You, Sabine Stoll et al.
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux, Sarina Meyer, Lyonel Behringer et al.
MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition
Ziping Zhao, Tian Gao, Haishuai Wang et al.
MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection
Da Mu, Zhicheng Zhang, Haobo Yue
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
Haiyang Sun, Fulin Zhang, Yingying Gao et al.
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos et al.
MinSpeech: A Corpus of Southern Min Dialect for Automatic Speech Recognition
Jiayan Lin, Shenghui Lu, Hukai Huang et al.
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
Hang Zhao, Yifei Xin, Zhesong Yu et al.
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
Payal Mohapatra, Shamika Likhite, Subrata Biswas et al.
Mitigating Overfitting in Structured Pruning of ASR Models with Gradient-Guided Parameter Regularization
Dong-Hyun Kim, Joon-Hyuk Chang
Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch
Thomas Graave, Zhengyang Li, Timo Lohrenz et al.
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Jiatong Shi, Shih-Heng Wang, William Chen et al.
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting
Zhiqi Ai, Zhiyong Chen, Shugong Xu
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Jiatong Shi, Xutai Ma, Hirofumi Inaguma et al.
Mmm whatcha say? Uncovering distal and proximal context effects in first and second-language word perception using psychophysical reverse correlation
Paige Tuttösí, H. Henny Yeung, Yue Wang et al.
MM-NodeFormer: Node Transformer Multimodal Fusion for Emotion Recognition in Conversation
Zilong Huang, Man-Wai Mak, Kong Aik Lee
MMSD-Net: Towards Multi-modal Stuttering Detection
Liangyu Nie, Sudarsana Reddy Kadiri, Ruchit Agrawal
Mobile PresenTra: NICT fast neural text-to-speech system on smartphones with incremental inference of MS-FC-HiFi-GAN for law-latency synthesis
Takuma Okamoto, Yamato Ohtani, Hisashi Kawai
Modality Translation Learning for Joint Speech-Text Model
Pin-Yen Liu, Jen-Tzung Chien
Modeling probabilistic reduction across domains with Naive Discriminative Learning
Anna Stein, Kevin Tang
Modeling Vocal Tract Like Acoustic Tubes Using the Immersed Boundary Method
Rongshuai Wu, Debasish Ray Mohapatra, Sidney Fels
Modelled Multivariate Overlap: A method for measuring vowel merger
Irene Smith, Morgan Sonderegger, The Spade Consortium