Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Gene-DML: Dual-Pathway Multi-Level Discrimination for Gene Expression Prediction from Histopathology Images WACV 2026

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation WACV 2026

Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention WACV 2026

SurgXBench: Explainable Vision-Language Model Benchmark for Surgery WACV 2026

SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection WACV 2026

FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs WACV 2026

Root Completion from Intraoral Scans of Tooth Crowns using Diffusion with Patch Perturbation WACV 2026

MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding WACV 2026

Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding WACV 2026

See, Think, Learn: A Self-Taught Multimodal Reasoner WACV 2026

Grounding Descriptions in Images informs Zero-Shot Visual Recognition WACV 2026

CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering WACV 2026

SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding WACV 2026

TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors WACV 2026

Harnessing Object Grounding for Time-Sensitive Video Understanding WACV 2026

Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios WACV 2026

Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction WACV 2026

WarpRF: Multi-View Consistency for Training-Free Uncertainty Quantification and Applications in Radiance Fields WACV 2026

PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models WACV 2026

GHOST: Getting to the Bottom of Hallucinations with A Multi-round Consistency Benchmark WACV 2026

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models WACV 2026

Zero-Shot Table Extraction in Business Documents: A Unified Benchmark with Error Taxonomy and Ecological Analysis WACV 2026

From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance WACV 2026

Distilling What and Why: Enhancing Driver Intention Prediction with MLLMs WACV 2026

Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking AAAI 2026