Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
WACV 2026
Seeing is Believing (and Predicting): Context-Aware Multi-Human Behavior Prediction with Vision Language Models
WACV 2026
Analysis of Text Accuracy and Visual Alignment in Vision-Language Models for Artistic Text Generation
WACV 2026
ZonUI-3B: Competitive GUI Grounding with a 3B VLM Trained on a Single Consumer GPU
WACV 2026
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
WACV 2026
mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval
WACV 2026
Being Positive about Negative Queries: Exclusion Aware Multimodal Retrieval using Disentangled Representations
WACV 2026
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
WACV 2026
Narrating For You: Prompt-guided Audio-visual Narrating Face Generation Employing Multi-entangled Latent Space
WACV 2026
Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment
WACV 2026
Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs
EACL 2026
BigTokDetect: A Clinically-Informed Vision–Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok
EACL 2026
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
EACL 2026
How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?
EACL 2026
SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space
EACL 2026
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
EACL 2026
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
EACL 2026
A Unified View on Emotion Representation in Large Language Models
EACL 2026
Is Information Density Uniform when Utterances are Grounded on Perception and Discourse?
EACL 2026
Rethinking Reading Order: Toward Generalizable Document Understanding with LLM-based Relation Modeling
EACL 2026
Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models
EACL 2026
CHROMIC: Chronological Reasoning Across Multi-Panel Comics
EACL 2026
DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning
EACL 2026
RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation
EACL 2026
VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use
AAAI 2026
<
1
…
24
25
26
…
523
>