Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Root Completion from Intraoral Scans of Tooth Crowns using Diffusion with Patch Perturbation WACV 2026

MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding WACV 2026

Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding WACV 2026

See, Think, Learn: A Self-Taught Multimodal Reasoner WACV 2026

Grounding Descriptions in Images informs Zero-Shot Visual Recognition WACV 2026

CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering WACV 2026

SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding WACV 2026

TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors WACV 2026

Harnessing Object Grounding for Time-Sensitive Video Understanding WACV 2026

Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios WACV 2026

Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction WACV 2026

WarpRF: Multi-View Consistency for Training-Free Uncertainty Quantification and Applications in Radiance Fields WACV 2026

PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models WACV 2026

GHOST: Getting to the Bottom of Hallucinations with A Multi-round Consistency Benchmark WACV 2026

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models WACV 2026

Zero-Shot Table Extraction in Business Documents: A Unified Benchmark with Error Taxonomy and Ecological Analysis WACV 2026

From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance WACV 2026

Distilling What and Why: Enhancing Driver Intention Prediction with MLLMs WACV 2026

EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models WACV 2026

The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs WACV 2026

Seeing is Believing (and Predicting): Context-Aware Multi-Human Behavior Prediction with Vision Language Models WACV 2026

Analysis of Text Accuracy and Visual Alignment in Vision-Language Models for Artistic Text Generation WACV 2026

ZonUI-3B: Competitive GUI Grounding with a 3B VLM Trained on a Single Consumer GPU WACV 2026

You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction WACV 2026

MemeWeaver: Inter-Meme Graph Reasoning for Sexism and Misogyny Detection EACL 2026