Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Hierarchical Multi-Supervision Multi-Interaction Graph Attention Network for Multi-Camera Pedestrian Trajectory Prediction
AAAI 2022
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022
Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks
CVPR 2022
Single-Stage Visual Relationship Learning using Conditional Queries
NIPS 2022
I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification
NIPS 2022
Grounded Video Situation Recognition
NIPS 2022
Egocentric Video-Language Pretraining
NIPS 2022
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
NIPS 2022
Modeling Temporal-Modal Entity Graph for Procedural Multimodal Machine Comprehension
ACL 2022
OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages
ACL 2022
Things not Written in Text: Exploring Spatial Commonsense from Visual Signals
ACL 2022
Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning
WACV 2022
Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations
ACL 2022
Image Retrieval from Contextual Descriptions
ACL 2022
MMCoQA: Conversational Question Answering over Text, Tables, and Images
ACL 2022
Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration
ACL 2022
Contextual Fine-to-Coarse Distillation for Coarse-grained Response Selection in Open-Domain Conversations
ACL 2022
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
ACL 2022
MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding
ACL 2022
CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment
ACL 2022
Bridging between Cognitive Processing Signals and Linguistic Features via a Unified Attentional Network
AAAI 2022
Complementary Attention Gated Network for Pedestrian Trajectory Prediction
AAAI 2022
Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition
AAAI 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
AAAI 2022
You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation
AAAI 2022
<
1
…
41
42
43
…
59
>