Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Decomposed Cross-Modal Distillation for RGB-Based Temporal Action Detection
CVPR 2023
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling
CVPR 2023
Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks
CVPR 2023
Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation
CVPR 2023
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
CVPR 2023
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
CVPR 2023
PaletteNeRF: Palette-Based Appearance Editing of Neural Radiance Fields
CVPR 2023
ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis via Contrastive Learning
CVPR 2023
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning
CVPR 2023
GeoVLN: Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Language Navigation
CVPR 2023
Test of Time: Instilling Video-Language Models With a Sense of Time
CVPR 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
CVPR 2023
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
CVPR 2023
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
CVPR 2023
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
CVPR 2023
Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
CVPR 2023
Multilateral Semantic Relations Modeling for Image Text Retrieval
CVPR 2023
All in One: Exploring Unified Video-Language Pre-Training
CVPR 2023
TubeDETR: Spatio-Temporal Video Grounding With Transformers
CVPR 2022
Exposing the Limits of Video-Text Models through Contrast Sets
NAACL 2022
VLStereoSet: A Study of Stereotypical Bias in Pre-trained Vision-Language Models
IJCNLP 2022
SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval
WACV 2022
ScanQA: 3D Question Answering for Spatial Scene Understanding
CVPR 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
CVPR 2022
Multimodal Token Fusion for Vision Transformers
CVPR 2022
<
1
…
37
38
39
…
51
>