Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
RampWatch: An In-the-Wild Dataset and Text-Guided Detection Framework for Recreational Vessels
WACV 2026
MemeTAG: Keyword-Driven Meme Classification through Tag Embedding Reconstruction
WACV 2026
SimForce: Force and Surface Electromyography from Full Body Video with Graph Neural Nets
WACV 2026
SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding
WACV 2026
Learnable Query-Enhanced Pose Transformation
WACV 2026
Reconstructing Realistic and Relightable Eyes
WACV 2026
UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning
WACV 2026
Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships
WACV 2026
Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score
WACV 2026
PromptGAR: Flexible Promptive Group Activity Recognition
WACV 2026
brat: Aligned Multi-View Embeddings for Brain MRI Analysis
WACV 2026
Semantic Map Guided Bird's-Eye View Learning for Online HD Map Construction
WACV 2026
Gene-DML: Dual-Pathway Multi-Level Discrimination for Gene Expression Prediction from Histopathology Images
WACV 2026
FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation
WACV 2026
Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention
WACV 2026
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
WACV 2026
SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
WACV 2026
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
WACV 2026
Root Completion from Intraoral Scans of Tooth Crowns using Diffusion with Patch Perturbation
WACV 2026
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding
WACV 2026
Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding
WACV 2026
See, Think, Learn: A Self-Taught Multimodal Reasoner
WACV 2026
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
WACV 2026
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering
WACV 2026
Extreme Amodal Face Detection
WACV 2026
<
1
…
43
44
45
…
523
>