Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
VideoRAG: Retrieval-Augmented Generation over Video Corpus
ACL 2025
Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration
CVPR 2025
LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
ACL 2025
Evaluating Model Perception of Color Illusions in Photorealistic Scenes
CVPR 2025
Just KIDDIN’ : Knowledge Infusion and Distillation for Detection of INdecent Memes
ACL 2025
Howard University-AI4PC at SemEval-2025 Task 1: Using GPT-4o and CLIP-ViLT to Decode Figurative Language Across Text and Images
ACL 2025
Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
ACL 2025
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
CVPR 2025
FREE: Fast and Robust Vision Language Models with Early Exits
ACL 2025
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
CVPR 2025
M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data
ACL 2025
Compositional Caching for Training-free Open-vocabulary Attribute Detection
CVPR 2025
Can Vision Language Models Understand Mimed Actions?
ACL 2025
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
CVPR 2025
Testing Spatial Intuitions of Humans and Large Language and Multimodal Models in Analogies
ACL 2025
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
CVPR 2025
Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning
ACL 2025
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
CVPR 2025
Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models
ACL 2025
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
CVPR 2025
Making LVLMs Look Twice: Contrastive Decoding with Contrast Images
ACL 2025
Less Attention is More: Prompt Transformer for Generalized Category Discovery
CVPR 2025
UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation
ACL 2025
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
CVPR 2025
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
ICCV 2025
<
1
…
10
11
12
…
51
>