Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Learning Representations from Audio-Visual Spatial Alignment
NIPS 2020
Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
NIPS 2020
RANet: Region Attention Network for Semantic Segmentation
NIPS 2020
3D Shape Reconstruction from Vision and Touch
NIPS 2020
COBE: Contextualized Object Embeddings from Narrated Instructional Video
NIPS 2020
Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather
CVPR 2020
Learning Interactions and Relationships Between Movie Characters
CVPR 2020
MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model
CVPR 2020
VQA With No Questions-Answers Training
CVPR 2020
SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions
CVPR 2020
Video Object Grounding Using Semantic Roles in Language Description
CVPR 2020
Webly Supervised Knowledge Embedding Model for Visual Reasoning
CVPR 2020
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA
CVPR 2020
Local-Global Video-Text Interactions for Temporal Grounding
CVPR 2020
Hierarchical Conditional Relation Networks for Video Question Answering
CVPR 2020
Visual-Semantic Matching by Exploring High-Order Attention and Distraction
CVPR 2020
Iterative Context-Aware Graph Inference for Visual Dialog
CVPR 2020
Listen to Look: Action Recognition by Previewing Audio
CVPR 2020
IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
CVPR 2020
Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning
CVPR 2020
Spatio-Temporal Graph for Video Captioning With Knowledge Distillation
CVPR 2020
Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks
INTERSPEECH 2020
M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues
AAAI 2020
DeepDualMapper: A Gated Fusion Network for Automatic Map Extraction Using Aerial Images and Trajectories
AAAI 2020
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding
AAAI 2020
<
1
…
108
109
110
…
128
>