Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Detecting Attended Visual Targets in Video
CVPR 2020
MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction
CVPR 2020
ActBERT: Learning Global-Local Video-Text Representations
CVPR 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
CVPR 2020
Fashion Outfit Complementary Item Retrieval
CVPR 2020
Multi-View Neural Human Rendering
CVPR 2020
Learning to Have an Ear for Face Super-Resolution
CVPR 2020
JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection
CVPR 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
CVPR 2020
ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes
CVPR 2020
Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention
CVPR 2020
Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!
EMNLP 2020
Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction
EMNLP 2020
Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis
EMNLP 2020
Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos
EMNLP 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
EMNLP 2020
Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading
EMNLP 2020
Quantifying Intimacy in Language
EMNLP 2020
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements
EMNLP 2020
NwQM: A neural quality assessment framework for Wikipedia
EMNLP 2020
VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles
EMNLP 2020
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
EMNLP 2020
ConceptBert: Concept-Aware Representation for Visual Question Answering
EMNLP 2020
Robust and Interpretable Grounding of Spatial References with Relation Networks
EMNLP 2020
Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays
EMNLP 2020
<
1
…
106
107
108
…
128
>