Papers
Mind the gap: On the value of silence representations to lexical-based speech emotion recognition
Matthew Perez, Mimansa Jaiswal, Minxue Niu et al.
Minimizing Sequential Confusion Error in Speech Command Recognition
Zhanheng Yang, Hang Lv, Xiong Wang et al.
Minimum latency training of sequence transducers for streaming end-to-end speech recognition
Yusuke Shinohara, Shinji Watanabe
MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka et al.
Mitigating bias against non-native accents
Yuanyuan Zhang, Yixuan Zhang, Bence Halpern et al.
Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)
Ziyao Zhang, Alessio Falai, Ariadna Sanchez et al.
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang, Kaitao Song, Xu Tan et al.
Mixup regularization strategies for spoofing countermeasure system
Woohyun Kang, Md Jahangir Alam, Abderrahim Fathan
Model Compression by Iterative Pruning with Knowledge Distillation and Its Application to Speech Enhancement
Zeyuan Wei, Li Hao, Xueliang Zhang
Modelling Turn-taking in Multispeaker Parties for Realistic Data Simulation
Jack Deadman, Jon Barker
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction
Hao Shi, Longbiao Wang, Sheng Li et al.
Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections
Seorim Hwang, Sung Wook Park, Youngcheol Park
MOS Prediction Network for Non-intrusive Speech Quality Assessment in Online Conferencing
Wenjing Liu, Chuan Xie
MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
Karl El Hajal, Milos Cernak, Pablo Mainar
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild
Tao Liu, Shuai Fan, Xu Xiang et al.
MSR-NV: Neural Vocoder Using Multiple Sampling Rates
Kentaro Mitsui, Kei Sawada
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
Ryandhimas Edo Zezario, Szu-wei Fu, Fei Chen et al.
Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays
Chengdong Liang, Yijiang Chen, Jiadi Yao et al.
Multichannel Speech Separation with Narrow-band Conformer
Changsheng Quan, Xiaofei Li
Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data
MengLong Xu, Shengqiang Li, Chengdong Liang et al.
Multi-Corpus Speech Emotion Recognition for Unseen Corpus Using Corpus-Wise Weights in Classification Loss
Youngdo Ahn, Sung Joo Lee, Jong Won Shin
Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning
Mufan Sang, John H.L. Hansen
Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition
Zihan Zhao, Yanfeng Wang, Yu Wang
Multilingual and Multimodal Abuse Detection
Rini Sharon, Heet Shah, Debdoot Mukherjee et al.