Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

UniCalib: Targetless LiDAR-camera Calibration via Probabilistic Flow on Unified Depth Representations WACV 2026

RegionAligner: Bridging Ego-Exo Views for Object Correspondence via Unified Text-Visual Learning WACV 2026

PoseGaussian: Pose-Driven Novel View Synthesis for Robust 3D Human Reconstruction WACV 2026

Geo3DVQA: Evaluating Vision-Language Models for 3D Geospatial Reasoning from Aerial Imagery WACV 2026

ORCA: Object Recognition and Comprehension for Archiving Marine Species WACV 2026

DuPLUS: Dual-Prompt Vision-Language Model for Universal Medical Image Segmentation and Prognosis WACV 2026

Bridging the Domain Gap in Small Multimodal Models: A Dual-level Alignment Perspective WACV 2026

Referring Change Detection in Remote Sensing Imagery WACV 2026

VLMs Guided Interpretable Decision Making in Autonomous Driving WACV 2026

Large Sign Language Models: Toward 3D American Sign Language Translation WACV 2026

KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding WACV 2026

Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning WACV 2026

ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval WACV 2026

SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination WACV 2026

MapVerse: A Benchmark for Geospatial Question Answering on Diverse Real-World Maps WACV 2026

VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models WACV 2026

Generalizing Sports Feedback Generation by Watching Competitions and Reading Books: A Rock Climbing Case Study WACV 2026

Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance WACV 2026

VRAgent: Self-Refining Agent for Zero-Shot Multimodal Video Retrieval WACV 2026

M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models WACV 2026

BanglaProtha: Evaluating Vision Language Models in Underrepresented Long-tail Cultural Contexts WACV 2026

One-shot Portrait Stylizaiton via Geometric Alignment WACV 2026

AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization WACV 2026

Patch Your Matcher: Correspondence-Aware Image-to-Image Translation Unlocks Cross-Modal Matching via Single-Modality Priors WACV 2026

Lost in Time? A Meta-Learning Framework for Time-Shift-Tolerant Physiological Signal Transformation AAAI 2026