Papers
Multi-Modal Embeddings Using Multi-Task Learning for Emotion Recognition
Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram
Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks
Krishna D. N., Ankita Patil
Multi-Modal Fusion with Gating Using Audio, Lexical and Disfluency Features for Alzheimer’s Dementia Recognition from Spontaneous Speech
Morteza Rohanian, Julian Hough, Matthew Purver
Multimodal Inductive Transfer Learning for Detection of Alzheimer’s Dementia and its Severity
Utkarsh Sarawgi, Wazeer Zulfikar, Nouran Soliman et al.
Multi-Modality Matters: A Performance Leap on VoxCeleb
Zhengyang Chen, Shuai Wang, Yanmin Qian
Multimodal Semi-Supervised Learning Framework for Punctuation Prediction in Conversational Speech
Monica Sunkara, Srikanth Ronanki, Dhanush Bekal et al.
Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning
Katerina Papadimitriou, Gerasimos Potamianos
Multimodal Speech Emotion Recognition Using Cross Attention with Aligned Audio and Text
Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung
Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu, Cornelius Weber, Stefan Wermter
Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix et al.
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
Matt Whitehill, Shuang Ma, Daniel McDuff et al.
Multi-Scale Convolution for Robust Keyword Spotting
Chen Yang, Xue Wen, Liming Song
Multiscale System for Alzheimer’s Dementia Recognition Through Spontaneous Speech
Erik Edwards, Charles Dognin, Bajibabu Bollepalli et al.
Multi-Scale TCN: Exploring Better Temporal DNN Model for Causal Speech Enhancement
Lu Zhang, Mingjiang Wang
Multi-Speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network
Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon et al.
Multi-Speaker Text-to-Speech Synthesis Using Deep Gaussian Processes
Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen, Xu Tan, Yi Ren et al.
Multi-Stream Attention-Based BLSTM with Feature Segmentation for Speech Emotion Recognition
Yuya Chiba, Takashi Nose, Akinori Ito
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR
Thilo von Neumann, Christoph Boeddeker, Lukas Drude et al.
Multi-Task Learning for End-to-End Noise-Robust Bandwidth Extension
Nana Hou, Chenglin Xu, Joey Tianyi Zhou et al.
Multi-Task Learning for Voice Related Recognition Tasks
Ana Montalvo, Jose R. Calvo, Jean-François Bonastre
Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention
Myunghun Jung, Youngmoon Jung, Jahyun Goo et al.
Multi-Task Siamese Neural Network for Improving Replay Attack Detection
Patrick von Platen, Fei Tao, Gokhan Tur
NAAGN: Noise-Aware Attention-Gated Network for Speech Enhancement
Feng Deng, Tao Jiang, Xiao-Rui Wang et al.
Naturalness Enhancement with Linguistic Information in End-to-End TTS Using Unsupervised Parallel Encoding
Alex Peiró-Lilja, Mireia Farrús