Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
EMNLP 2022
Open-Domain Sign Language Translation Learned from Online Video
EMNLP 2022
CapOnImage: Context-driven Dense-Captioning on Image
EMNLP 2022
Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization
AAAI 2022
Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)
AAAI 2022
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation
EMNLP 2022
TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition
NIPS 2022
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
CVPR 2022
Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding
CVPR 2022
Make It Move: Controllable Image-to-Video Generation With Text Descriptions
CVPR 2022
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
CVPR 2022
REX: Reasoning-Aware and Grounded Explanation
CVPR 2022
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation
CVPR 2022
Grounding Answers for Visual Questions Asked by Visually Impaired People
CVPR 2022
Grounded Language-Image Pre-Training
CVPR 2022
Multimodal Semi-supervised Learning for Disaster Tweet Classification
COLING 2022
Visual Commonsense in Pretrained Unimodal and Multimodal Models
NAACL 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
CVPR 2022
COGMEN: COntextualized GNN based Multimodal Emotion recognitioN
NAACL 2022
A Computational Acquisition Model for Multimodal Word Categorization
NAACL 2022
Multimodal Dialogue State Tracking
NAACL 2022
VGNMN: Video-grounded Neural Module Networks for Video-Grounded Dialogue Systems
NAACL 2022
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
NAACL 2022
Multilingual and Multimodal Topic Modelling with Pretrained Embeddings
COLING 2022
Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows
COLING 2022
<
1
…
39
40
41
…
51
>