Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Gene-DML: Dual-Pathway Multi-Level Discrimination for Gene Expression Prediction from Histopathology Images
WACV 2026
FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation
WACV 2026
Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention
WACV 2026
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
WACV 2026
SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
WACV 2026
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
WACV 2026
Root Completion from Intraoral Scans of Tooth Crowns using Diffusion with Patch Perturbation
WACV 2026
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding
WACV 2026
Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding
WACV 2026
See, Think, Learn: A Self-Taught Multimodal Reasoner
WACV 2026
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
WACV 2026
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering
WACV 2026
SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding
WACV 2026
TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors
WACV 2026
Harnessing Object Grounding for Time-Sensitive Video Understanding
WACV 2026
Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios
WACV 2026
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
WACV 2026
WarpRF: Multi-View Consistency for Training-Free Uncertainty Quantification and Applications in Radiance Fields
WACV 2026
PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models
WACV 2026
GHOST: Getting to the Bottom of Hallucinations with A Multi-round Consistency Benchmark
WACV 2026
ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
WACV 2026
Zero-Shot Table Extraction in Business Documents: A Unified Benchmark with Error Taxonomy and Ecological Analysis
WACV 2026
From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance
WACV 2026
Distilling What and Why: Enhancing Driver Intention Prediction with MLLMs
WACV 2026
Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking
AAAI 2026
<
1
…
23
24
25
…
523
>