Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation WACV 2026

Cross-Modal Event Encoder: Bridging Image-Text Knowledge to Event Streams WACV 2026

Dual-Domain Multimodal Hyperbolic Fusion for Cardiopulmonary Disease Diagnosis in Emergency Care WACV 2026

Training-Free Few-Shot Segmentation via Vision-Language Guided Prompting WACV 2026

Ordinal-Aware Multimodal Engagement Recognition for Collaborative Learning WACV 2026

CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning WACV 2026

Fused Similarity Measure Based Alignment with Dual-Scale Adaptive Selection for Weakly Supervised Video Anomaly Detection WACV 2026

mmWEAVER: Environment-Specific mmWave Signal Synthesis from a Photo and Activity Description WACV 2026

LASER: Lip Landmark Assisted Speaker Detection for Robustness WACV 2026

Sea-CLIP: Mining Semantic-Aware Representations for Few-Shot Anomaly Detection with CLIP WACV 2026

Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video? WACV 2026

Robust Multimodal Emotion Recognition from Incomplete Modalities via Query-Based Unimodal and Cross-Modal Learning WACV 2026

UniCalib: Targetless LiDAR-camera Calibration via Probabilistic Flow on Unified Depth Representations WACV 2026

RegionAligner: Bridging Ego-Exo Views for Object Correspondence via Unified Text-Visual Learning WACV 2026

PoseGaussian: Pose-Driven Novel View Synthesis for Robust 3D Human Reconstruction WACV 2026

Geo3DVQA: Evaluating Vision-Language Models for 3D Geospatial Reasoning from Aerial Imagery WACV 2026

ORCA: Object Recognition and Comprehension for Archiving Marine Species WACV 2026

DuPLUS: Dual-Prompt Vision-Language Model for Universal Medical Image Segmentation and Prognosis WACV 2026

Bridging the Domain Gap in Small Multimodal Models: A Dual-level Alignment Perspective WACV 2026

Referring Change Detection in Remote Sensing Imagery WACV 2026

VLMs Guided Interpretable Decision Making in Autonomous Driving WACV 2026

Large Sign Language Models: Toward 3D American Sign Language Translation WACV 2026

KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding WACV 2026

Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning WACV 2026

IPFormer: Instance Prompt-guided Transformer for Multi-modal Multi-shot Video Understanding AAAI 2026