Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Frame-Event Alignment and Fusion Network for High Frame Rate Tracking
CVPR 2023
Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
CVPR 2023
Multimodal Prompting With Missing Modalities for Visual Recognition
CVPR 2023
Mask3D: Pre-Training 2D Vision Transformers by Learning Masked 3D Priors
CVPR 2023
GRES: Generalized Referring Expression Segmentation
CVPR 2023
SceneTrilogy: On Human Scene-Sketch and Its Complementarity With Photo and Text
CVPR 2023
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
CVPR 2023
Burstormer: Burst Image Restoration and Enhancement Transformer
CVPR 2023
Video-Helpful Multimodal Machine Translation
EMNLP 2023
Find-2-Find: Multitask Learning for Anaphora Resolution and Object Localization
EMNLP 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
EMNLP 2023
Enhancing Textbooks with Visuals from the Web for Improved Learning
EMNLP 2023
Impressions: Visual Semiotics and Aesthetic Impact Understanding
EMNLP 2023
Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs
EMNLP 2023
When are Lemons Purple? The Concept Association Bias of Vision-Language Models
EMNLP 2023
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
EMNLP 2023
FACTIFY3M: A benchmark for multimodal fact verification with explainability through 5W Question-Answering
EMNLP 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
EMNLP 2023
Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
EMNLP 2023
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
EMNLP 2023
Affection: Learning Affective Explanations for Real-World Visual Data
CVPR 2023
StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields
CVPR 2023
Learning To Dub Movies via Hierarchical Prosody Models
CVPR 2023
Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language
CVPR 2023
Multimodal Industrial Anomaly Detection via Hybrid Fusion
CVPR 2023
<
1
…
31
32
33
…
51
>