Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
SurgPub-Video: A Comprehensive Surgical Video Framework for Enhanced Surgical Intelligence in Vision-Language Model
AAAI 2026
Dual-Phase Visual-Language Pretraining and Adaptation for Long-Tailed Multi-Label Recognition
AAAI 2026
Multi-Agent Undercover Gaming: Hallucination Removal Through Counterfactual Test for Multimodal Reasoning
AAAI 2026
Regression over Classification: Assessing Image Aesthetics via Multimodal Large Language Models
AAAI 2026
Learning Spatial Decay for Vision Transformers
AAAI 2026
Suit the Remedy to the Retriever: Interpretable Query Optimization with Retriever Preference Alignment for Vision-Language Retrieval
AAAI 2026
Imagine with Layout and Sketch: Enhancing Vision-Language Retrieval with Dual-Stream Multi-Modal Query Refinement
AAAI 2026
Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization
AAAI 2026
Leveraging Textual Compositional Reasoning for Robust Change Captioning
AAAI 2026
FineVAU: A Novel Human-Aligned Benchmark for Fine-Grained Video Anomaly Understanding
AAAI 2026
What to Trust? A Trust-aware Knowledge-guided Method for Zero-shot Object State Understanding in Videos
AAAI 2026
ImageSet2Text: Describing Sets of Images Through Text
AAAI 2026
MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models
AAAI 2026
DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities
AAAI 2026
vMFCoOp: Towards Equilibrium on a Unified Hyperspherical Manifold for Prompting Biomedical VLMs
AAAI 2026
SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse
AAAI 2026
Video Spatial Reasoning with Object-Centric 3D Rollout
AAAI 2026
Mitigating Low-Quality Reasoning in MLLMs: Self-Driven Refined Multimodal CoT with Selective Thinking and Step-wise Visual Enhancement
AAAI 2026
UniMo: Unified Motion Generation and Understanding with Chain of Thought
AAAI 2026
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
AAAI 2026
TIME: Temporal-Sensitive Multi-Dimensional Instruction Tuning and Robust Benchmarking for Video-LLMs
AAAI 2026
DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation
AAAI 2026
PBR3DGen: A VLM-Guided Mesh Generation with High-Quality PBR Texture
AAAI 2026
Remodeling Semantic Relationships in Vision-Language Fine-Tuning
AAAI 2026
What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study
AAAI 2026
<
1
…
35
36
37
…
523
>