Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval
WACV 2026
LVM-Lite: Training Large Vision Models with Efficient Sequential Modeling
WACV 2026
SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination
WACV 2026
MapVerse: A Benchmark for Geospatial Question Answering on Diverse Real-World Maps
WACV 2026
Explaining the Unseen: Multimodal Vision-Language Reasoning for Situational Awareness in Underground Mining Disasters
WACV 2026
Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting
WACV 2026
DermEVAL: A Dermatologist-Reviewed Benchmark for Multimodal Large Language Models
WACV 2026
ExDDV: A New Dataset for Explainable Deepfake Detection in Video
WACV 2026
Understanding Human-Like Biases in VLMs via Subjective Face Analytics
WACV 2026
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
WACV 2026
Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression
WACV 2026
ATM: Enhanced Alignment for Text-to-Motion Generation
WACV 2026
DenseBEV: Transforming BEV Grid Cells into 3D Objects
WACV 2026
CLIP-IT: CLIP-based Pairing of Histology Images with Privileged Textual Information
WACV 2026
Towards Unconstrained Cross-View Pose Estimation
WACV 2026
Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation
WACV 2026
Cross-Modal Event Encoder: Bridging Image-Text Knowledge to Event Streams
WACV 2026
Dual-Domain Multimodal Hyperbolic Fusion for Cardiopulmonary Disease Diagnosis in Emergency Care
WACV 2026
Training-Free Few-Shot Segmentation via Vision-Language Guided Prompting
WACV 2026
Ordinal-Aware Multimodal Engagement Recognition for Collaborative Learning
WACV 2026
CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning
WACV 2026
Fused Similarity Measure Based Alignment with Dual-Scale Adaptive Selection for Weakly Supervised Video Anomaly Detection
WACV 2026
mmWEAVER: Environment-Specific mmWave Signal Synthesis from a Photo and Activity Description
WACV 2026
LASER: Lip Landmark Assisted Speaker Detection for Robustness
WACV 2026
Broadcast2Pitch: Game State Reconstruction from Unconstrained Soccer Videos
WACV 2026
<
1
2
3
4
5
…
523
>