Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination
WACV 2026
MapVerse: A Benchmark for Geospatial Question Answering on Diverse Real-World Maps
WACV 2026
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
WACV 2026
Generalizing Sports Feedback Generation by Watching Competitions and Reading Books: A Rock Climbing Case Study
WACV 2026
Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance
WACV 2026
VRAgent: Self-Refining Agent for Zero-Shot Multimodal Video Retrieval
WACV 2026
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
WACV 2026
BanglaProtha: Evaluating Vision Language Models in Underrepresented Long-tail Cultural Contexts
WACV 2026
One-shot Portrait Stylizaiton via Geometric Alignment
WACV 2026
AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization
WACV 2026
Patch Your Matcher: Correspondence-Aware Image-to-Image Translation Unlocks Cross-Modal Matching via Single-Modality Priors
WACV 2026
Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos
WACV 2026
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
WACV 2026
Countering Multi-modal Representation Collapse through Rank-targeted Fusion
WACV 2026
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
WACV 2026
SuperRivolution: Fine-Scale Rivers from Coarse Temporal Satellite Imagery
WACV 2026
VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction
WACV 2026
DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
WACV 2026
Beyond Faces: A Multimodal Person Clustering for Unconstrained Environments
WACV 2026
DreamCatcher: Efficient Multi-Concept Customization via Representation Finetuning
WACV 2026
Multimodal Graph Representation Learning over Arbitrary Sets of Modalities
WACV 2026
From Prompt to Production: Automating Brand-Safe Marketing Imagery with Text-to-Image Models
WACV 2026
SOAF: Scene Occlusion-aware Neural Acoustic Field
WACV 2026
Test-Time Consistency in Vision Language Models
WACV 2026
UniAPO: Unified Multimodal Automated Prompt Optimization
AAAI 2026
<
1
…
42
43
44
…
523
>