Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution
COLING 2022
Findings of the First WMT Shared Task on Sign Language Translation (WMT-SLT22)
EMNLP 2022
Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
INTERSPEECH 2022
Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment
CVPR 2022
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues
ACL 2022
Learning Functional Distributional Semantics with Visual Data
ACL 2022
Detecting Euphemisms with Literal Descriptions and Visual Imagery
EMNLP 2022
Cross-modal Transfer Between Vision and Language for Protest Detection
EMNLP 2022
Lexi: Self-Supervised Learning of the UI Language
EMNLP 2022
Entity-level Interaction via Heterogeneous Graph for Multimodal Named Entity Recognition
EMNLP 2022
UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog
CVPR 2022
EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval
CVPR 2022
Maintaining Reasoning Consistency in Compositional Visual Question Answering
CVPR 2022
Finding Fallen Objects via Asynchronous Audio-Visual Integration
CVPR 2022
AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection
IJCAI 2022
MMCoQA: Conversational Question Answering over Text, Tables, and Images
ACL 2022
Scene-Text Aware Image and Text Retrieval with Dual-Encoder
ACL 2022
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
ACL 2022
Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks
ACL 2022
Multimodal Sarcasm Target Identification in Tweets
ACL 2022
ViLMedic: a framework for research at the intersection of vision and language in medical AI
ACL 2022
Comprehensive Multi-Modal Interactions for Referring Image Segmentation
ACL 2022
Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task
EMNLP 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
EMNLP 2022
RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval
EMNLP 2022
<
1
…
38
39
40
…
51
>