Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Bridging the Domain Gap in Small Multimodal Models: A Dual-level Alignment Perspective
WACV 2026
Referring Change Detection in Remote Sensing Imagery
WACV 2026
VLMs Guided Interpretable Decision Making in Autonomous Driving
WACV 2026
Large Sign Language Models: Toward 3D American Sign Language Translation
WACV 2026
KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding
WACV 2026
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
WACV 2026
ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval
WACV 2026
SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination
WACV 2026
MapVerse: A Benchmark for Geospatial Question Answering on Diverse Real-World Maps
WACV 2026
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
WACV 2026
Generalizing Sports Feedback Generation by Watching Competitions and Reading Books: A Rock Climbing Case Study
WACV 2026
Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance
WACV 2026
VRAgent: Self-Refining Agent for Zero-Shot Multimodal Video Retrieval
WACV 2026
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
WACV 2026
BanglaProtha: Evaluating Vision Language Models in Underrepresented Long-tail Cultural Contexts
WACV 2026
One-shot Portrait Stylizaiton via Geometric Alignment
WACV 2026
AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization
WACV 2026
Patch Your Matcher: Correspondence-Aware Image-to-Image Translation Unlocks Cross-Modal Matching via Single-Modality Priors
WACV 2026
Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos
WACV 2026
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
WACV 2026
Countering Multi-modal Representation Collapse through Rank-targeted Fusion
WACV 2026
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
WACV 2026
SuperRivolution: Fine-Scale Rivers from Coarse Temporal Satellite Imagery
WACV 2026
VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction
WACV 2026
Visual–Linguistic Abductive Reasoning with LLMs for Knowledge-based Visual Question Answering
EACL 2026
<
1
…
11
12
13
…
523
>