Multimodal Learning
13,057 papers
Papers per year
1
3
6
2
5
2
3
6
24
20
46
109
205
299
622
675
987
1084
1697
2500
3654
1107
'10
'15
'20
'25
Papers
Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
INTERSPEECH 2022
Emphasis Control for Parallel Neural TTS
INTERSPEECH 2022
Evoc-Learn — High quality simulation of early vocal learning
INTERSPEECH 2022
CT-SAT: Contextual Transformer for Sequential Audio Tagging
INTERSPEECH 2022
Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
INTERSPEECH 2022
Speaker recognition-assisted robust audio deepfake detection
INTERSPEECH 2022
Towards Error-Resilient Neural Speech Coding
INTERSPEECH 2022
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration
INTERSPEECH 2022
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers
INTERSPEECH 2022
Expressive, Variable, and Controllable Duration Modelling in TTS
INTERSPEECH 2022