Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations
WACV 2026
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
WACV 2026
Narrating For You: Prompt-guided Audio-visual Narrating Face Generation Employing Multi-entangled Latent Space
WACV 2026
Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment
WACV 2026
Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs
EACL 2026
BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok
EACL 2026
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
EACL 2026
How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?
EACL 2026
SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space
EACL 2026
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
EACL 2026
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
EACL 2026
A Unified View on Emotion Representation in Large Language Models
EACL 2026
Is Information Density Uniform when Utterances are Grounded on Perception and Discourse?
EACL 2026
Rethinking Reading Order: Toward Generalizable Document Understanding with LLM-based Relation Modeling
EACL 2026
Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models
EACL 2026
CHROMIC: Chronological Reasoning Across Multi-Panel Comics
EACL 2026
DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning
EACL 2026
RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation
EACL 2026
ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images
EACL 2026
KidsArtBench: Multi-Dimensional Children’s Art Evaluation with Attribute-Aware MLLMs
EACL 2026
Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models
EACL 2026
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
EACL 2026
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
EACL 2026
Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs
EACL 2026
3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale
AAAI 2026
<
1
…
14
15
16
…
523
>