Research Explorer

MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection

Pengfei Cai, Yan Song, Kang Li et al.

2024 INTERSPEECH

MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features

Katharina Anderer, Andreas Reich, Matthias Wölfel

2024 INTERSPEECH

Measurement and simulation of pressure losses due to airflow in vocal tract models

Peter Birkholz, Patrick Häsner

2024 INTERSPEECH

Measuring acoustic dissimilarity of hierarchical markers in task-oriented dialogue with MFCC-based dynamic time warping

Natalia Morozova, Guanghao You, Sabine Stoll et al.

2024 INTERSPEECH

Meta Learning Text-to-Speech Synthesis in over 7000 Languages

Florian Lux, Sarina Meyer, Lyonel Behringer et al.

2024 INTERSPEECH

MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition

Ziping Zhao, Tian Gao, Haishuai Wang et al.

2024 INTERSPEECH

MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection

Da Mu, Zhicheng Zhang, Haobo Yue

2024 INTERSPEECH

MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition

Haiyang Sun, Fulin Zhang, Yingying Gao et al.

2024 INTERSPEECH

mHuBERT-147: A Compact Multilingual HuBERT Model

Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos et al.

2024 INTERSPEECH

MinSpeech: A Corpus of Southern Min Dialect for Automatic Speech Recognition

Jiayan Lin, Shenghui Lu, Hukai Huang et al.

2024 INTERSPEECH

MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning

Hang Zhao, Yifei Xin, Zhesong Yu et al.

2024 INTERSPEECH

Missingness-resilient Video-enhanced Multimodal Disfluency Detection

Payal Mohapatra, Shamika Likhite, Subrata Biswas et al.

2024 INTERSPEECH

Mitigating Overfitting in Structured Pruning of ASR Models with Gradient-Guided Parameter Regularization

Dong-Hyun Kim, Joon-Hyuk Chang

2024 INTERSPEECH

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch

Thomas Graave, Zhengyang Li, Timo Lohrenz et al.

2024 INTERSPEECH

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Jiatong Shi, Shih-Heng Wang, William Chen et al.

2024 INTERSPEECH

MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting

Zhiqi Ai, Zhiyong Chen, Shugong Xu

2024 INTERSPEECH

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

Jiatong Shi, Xutai Ma, Hirofumi Inaguma et al.

2024 INTERSPEECH

Mmm whatcha say? Uncovering distal and proximal context effects in first and second-language word perception using psychophysical reverse correlation

Paige Tuttösí, H. Henny Yeung, Yue Wang et al.

2024 INTERSPEECH

MM-NodeFormer: Node Transformer Multimodal Fusion for Emotion Recognition in Conversation

Zilong Huang, Man-Wai Mak, Kong Aik Lee

2024 INTERSPEECH

MMSD-Net: Towards Multi-modal Stuttering Detection

Liangyu Nie, Sudarsana Reddy Kadiri, Ruchit Agrawal

2024 INTERSPEECH

Mobile PresenTra: NICT fast neural text-to-speech system on smartphones with incremental inference of MS-FC-HiFi-GAN for law-latency synthesis

Takuma Okamoto, Yamato Ohtani, Hisashi Kawai

2024 INTERSPEECH

Modality Translation Learning for Joint Speech-Text Model

Pin-Yen Liu, Jen-Tzung Chien

2024 INTERSPEECH

Modeling probabilistic reduction across domains with Naive Discriminative Learning

Anna Stein, Kevin Tang

2024 INTERSPEECH

Modeling Vocal Tract Like Acoustic Tubes Using the Immersed Boundary Method

Rongshuai Wu, Debasish Ray Mohapatra, Sidney Fels

2024 INTERSPEECH

Modelled Multivariate Overlap: A method for measuring vowel merger

Irene Smith, Morgan Sonderegger, The Spade Consortium

2024 INTERSPEECH

Papers