Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Reality vs Counterfactual: Multi-World Contrastive Reinforcement Learning for Enhancing MLLM’s Theory of Mind in Egocentric Videos
AAAI 2026
MASP: Multi-Aspect Guided Emotion Reasoning with Soft Prompt Tuning In Vision-Language Models
AAAI 2026
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier
AAAI 2026
Voices, Faces, and Feelings: Multi-modal Emotion-Cognition Captioning for Mental Health Understanding
AAAI 2026
MCIE: Multimodal LLM-Driven Complex Instruction Image Editing with Spatial Guidance
AAAI 2026
UQ-Bench: A Benchmark for Evaluating Multimodal LLMs on Underwater Image Quality Assessment
AAAI 2026
PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching
AAAI 2026
Vision-language Incremental Learning with Dual Class-individual Memory
AAAI 2026
Rethinking Video-Language Model from the Language Input Perspective
AAAI 2026
Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies
AAAI 2026
RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
AAAI 2026
PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems
AAAI 2026
From Pixels to Logic: A Perception-Reasoning Decomposition Framework for Open-World Referring Expression Comprehension
AAAI 2026
Transferability of Adversarial Attacks in Video-based MLLMs: A Cross-modal Image-to-Video Approach
AAAI 2026
BayesVQA: Energy-Guided Bayesian Debiasing for Language-Bias-Robust Visual Question Answering
AAAI 2026
DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning
AAAI 2026
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
AAAI 2026
GranAlign: Granularity-Aware Alignment Framework for Zero-shot Video Moment Retrieval
AAAI 2026
MPJudge: Towards Perceptual Assessment of Music-Induced Paintings
AAAI 2026
SatireDecoder: Visual Cascaded Decoupling for Enhancing Satirical Image Comprehension
AAAI 2026
See, Rank, and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection
AAAI 2026
Modality and Task Adaptation for Enhanced Zero-shot Composed Image Retrieval
AAAI 2026
CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models
AAAI 2026
Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension
AAAI 2026
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
AAAI 2026
<
1
…
34
35
36
…
523
>