Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
From Prompt to Production: Automating Brand-Safe Marketing Imagery with Text-to-Image Models
WACV 2026
SOAF: Scene Occlusion-aware Neural Acoustic Field
WACV 2026
Test-Time Consistency in Vision Language Models
WACV 2026
Can We Challenge Open-Vocabulary Object Detectors with Generated Content in Street Scenes?
WACV 2026
RampWatch: An In-the-Wild Dataset and Text-Guided Detection Framework for Recreational Vessels
WACV 2026
MemeTAG: Keyword-Driven Meme Classification through Tag Embedding Reconstruction
WACV 2026
SimForce: Force and Surface Electromyography from Full Body Video with Graph Neural Nets
WACV 2026
SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding
WACV 2026
Learnable Query-Enhanced Pose Transformation
WACV 2026
Reconstructing Realistic and Relightable Eyes
WACV 2026
UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning
WACV 2026
Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships
WACV 2026
Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score
WACV 2026
PromptGAR: Flexible Promptive Group Activity Recognition
WACV 2026
brat: Aligned Multi-View Embeddings for Brain MRI Analysis
WACV 2026
Semantic Map Guided Bird's-Eye View Learning for Online HD Map Construction
WACV 2026
Gene-DML: Dual-Pathway Multi-Level Discrimination for Gene Expression Prediction from Histopathology Images
WACV 2026
FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation
WACV 2026
Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention
WACV 2026
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
WACV 2026
SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
WACV 2026
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
WACV 2026
Root Completion from Intraoral Scans of Tooth Crowns using Diffusion with Patch Perturbation
WACV 2026
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding
WACV 2026
A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models
EACL 2026
<
1
…
7
8
9
…
523
>