Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Text-guided 3D Human Generation from 2D Collections
EMNLP 2023
Sparse Black-Box Multimodal Attack for Vision-Language Adversary Generation
EMNLP 2023
Revealing Single Frame Bias for Video-and-Language Learning
ACL 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
ACL 2023
Multi-modal Action Chain Abductive Reasoning
ACL 2023
Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text
ACL 2023
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
ACL 2023
XtremeCLIP: Extremely Parameter-efficient Tuning for Low-resource Vision Language Understanding
ACL 2023
MultiQG-TI: Towards Question Generation from Multi-modal Sources
ACL 2023
e-Health CSIRO at RadSum23: Adapting a Chest X-Ray Report Generator to Multimodal Radiology Report Summarisation
ACL 2023
Incorporating Object-Level Visual Context for Multimodal Fine-Grained Entity Typing
EMNLP 2023
Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow
NIPS 2023
Vocabulary-free Image Classification
NIPS 2023
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
NIPS 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
NIPS 2023
MultiCMET: A Novel Chinese Benchmark for Understanding Multimodal Metaphor
EMNLP 2023
Visually Grounded Continual Language Learning with Selective Specialization
EMNLP 2023
Cross-Modal Semantic Enhanced Interaction for Image-Sentence Retrieval
WACV 2023
Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data
AISTATS 2023
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
EACL 2023
Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions
EACL 2023
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
EACL 2023
A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
ICML 2023
Continual Vision-Language Representation Learning with Off-Diagonal Information
ICML 2023
UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
ICML 2023
<
1
…
34
35
36
…
51
>