Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Root Completion from Intraoral Scans of Tooth Crowns using Diffusion with Patch Perturbation
WACV 2026
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding
WACV 2026
Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding
WACV 2026
See, Think, Learn: A Self-Taught Multimodal Reasoner
WACV 2026
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
WACV 2026
CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering
WACV 2026
SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding
WACV 2026
TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors
WACV 2026
Harnessing Object Grounding for Time-Sensitive Video Understanding
WACV 2026
Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios
WACV 2026
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
WACV 2026
WarpRF: Multi-View Consistency for Training-Free Uncertainty Quantification and Applications in Radiance Fields
WACV 2026
PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models
WACV 2026
GHOST: Getting to the Bottom of Hallucinations with A Multi-round Consistency Benchmark
WACV 2026
ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
WACV 2026
Zero-Shot Table Extraction in Business Documents: A Unified Benchmark with Error Taxonomy and Ecological Analysis
WACV 2026
From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance
WACV 2026
Distilling What and Why: Enhancing Driver Intention Prediction with MLLMs
WACV 2026
EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models
WACV 2026
The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
WACV 2026
Seeing is Believing (and Predicting): Context-Aware Multi-Human Behavior Prediction with Vision Language Models
WACV 2026
Analysis of Text Accuracy and Visual Alignment in Vision-Language Models for Artistic Text Generation
WACV 2026
ZonUI-3B: Competitive GUI Grounding with a 3B VLM Trained on a Single Consumer GPU
WACV 2026
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
WACV 2026
MemeWeaver: Inter-Meme Graph Reasoning for Sexism and Misogyny Detection
EACL 2026
<
1
…
13
14
15
…
523
>