Multimodal Learning
13057 directly classified papers
Papers per year
Papers
Evaluating Automatically Generated Phoneme Captions for Images
INTERSPEECH 2020
Audio-Visual Multi-Speaker Tracking Based on the GLMB Framework
INTERSPEECH 2020