Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Unleashing the Potential of Large Language Models for Text-to-Image Generation Through Autoregressive Representation Alignment
AAAI 2026
Identity-Aware Vision-Language Model for Explainable Face Forgery Detection
AAAI 2026
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
AAAI 2026
LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations
AAAI 2026
When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?
AAAI 2026
Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing
AAAI 2026
CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World
AAAI 2026
L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention
AAAI 2026
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
AAAI 2026
UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment
AAAI 2026
InstructDubber: Instruction-based Alignment for Zero-shot Movie Dubbing
AAAI 2026
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness
AAAI 2026
OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
AAAI 2026
Collaboratively “Copy & Paste” 2D-3D Features for Complex Video-to-Video Motion Editing
AAAI 2026
Mitigating Entity Hallucinations in 3D Radiology Report Generation via Dual-Stream Alignment
AAAI 2026
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
AAAI 2026
Less Is More: Vision Representation Compression for Efficient Video Generation with Large Language Models
AAAI 2026
Zero-Shot Open-Vocabulary Human Motion Grounding with Test-Time Training
AAAI 2026
MISF: MLLM Guided Iterative Sample Filtering for Data Fault Detection
AAAI 2026
LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs
AAAI 2026
Subspace-Aware Graph Construction and Contrastive Alignment for Multimodal Recommendation with Large Language Models
AAAI 2026
Data-Centric Sequential Recommendation with Relation-Augmented Generation
AAAI 2026
Enhancing Conversational Recommender Systems with Tree-Structured Knowledge and Pretrained Language Models
AAAI 2026
TGCA-LLM: Time-Aware Graph-Text Contrastive Alignment for Enhancing LLMs in Temporal Knowledge Graph Completion
AAAI 2026
Hearing More with Less: Multi-Modal Retrieval-and-Selection Augmented Conversational LLM-Based ASR
AAAI 2026
<
1
…
36
37
38
…
523
>