Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

RampWatch: An In-the-Wild Dataset and Text-Guided Detection Framework for Recreational Vessels WACV 2026

MemeTAG: Keyword-Driven Meme Classification through Tag Embedding Reconstruction WACV 2026

SimForce: Force and Surface Electromyography from Full Body Video with Graph Neural Nets WACV 2026

SceneProp: Combining Neural Network and Markov Random Field for Scene-Graph Grounding WACV 2026

Learnable Query-Enhanced Pose Transformation WACV 2026

Reconstructing Realistic and Relightable Eyes WACV 2026

UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning WACV 2026

Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships WACV 2026

Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score WACV 2026

PromptGAR: Flexible Promptive Group Activity Recognition WACV 2026

brat: Aligned Multi-View Embeddings for Brain MRI Analysis WACV 2026

Semantic Map Guided Bird's-Eye View Learning for Online HD Map Construction WACV 2026

Gene-DML: Dual-Pathway Multi-Level Discrimination for Gene Expression Prediction from Histopathology Images WACV 2026

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation WACV 2026

Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention WACV 2026

SurgXBench: Explainable Vision-Language Model Benchmark for Surgery WACV 2026

SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection WACV 2026

FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs WACV 2026

Root Completion from Intraoral Scans of Tooth Crowns using Diffusion with Patch Perturbation WACV 2026

MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding WACV 2026

Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding WACV 2026

See, Think, Learn: A Self-Taught Multimodal Reasoner WACV 2026

Grounding Descriptions in Images informs Zero-Shot Visual Recognition WACV 2026

CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering WACV 2026

Extreme Amodal Face Detection WACV 2026