Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations WACV 2026

SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis WACV 2026

Narrating For You: Prompt-guided Audio-visual Narrating Face Generation Employing Multi-entangled Latent Space WACV 2026

Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment WACV 2026

Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs EACL 2026

BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok EACL 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection EACL 2026

How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers? EACL 2026

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space EACL 2026

Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models EACL 2026

Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA EACL 2026

A Unified View on Emotion Representation in Large Language Models EACL 2026

Is Information Density Uniform when Utterances are Grounded on Perception and Discourse? EACL 2026

Rethinking Reading Order: Toward Generalizable Document Understanding with LLM-based Relation Modeling EACL 2026

Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models EACL 2026

CHROMIC: Chronological Reasoning Across Multi-Panel Comics EACL 2026

DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning EACL 2026

RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation EACL 2026

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images EACL 2026

KidsArtBench: Multi-Dimensional Children’s Art Evaluation with Attribute-Aware MLLMs EACL 2026

Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models EACL 2026

Extending Audio Context for Long-Form Understanding in Large Audio-Language Models EACL 2026

Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact EACL 2026

Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs EACL 2026

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale AAAI 2026