Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
ST-VLM: A Spatial-to-Image Multimodal Spatial-Temporal Prediction Framework with Vision-Language Model
AAAI 2026
TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition
AAAI 2026
SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World
AAAI 2026
STOLA: Self-Adaptive Touch-Language Framework for Tactile Commonsense Reasoning in Open-Ended Scenarios
AAAI 2026
SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning
AAAI 2026
Lightweight Adaptive Topological Layout and Semantic Mapping in Vision-and-Language Navigation on Websites
AAAI 2026
Multi-Modal Fact Knowledge Generation for Imbalanced Cross-Source Entity Alignment
AAAI 2026
MyGram: Modality-aware Graph Transformer with Global Distribution for Multi-modal Entity Alignment
AAAI 2026
Aligning Cross-View Visual Geometries in LVLMs Through Human-Like Reasoning Learning
AAAI 2026
HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection
AAAI 2026
D3ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs
AAAI 2026
VCGD: Visual Clue Guided Decoding with Caption Model for Mitigating Hallucination in Multimodal Large Language Models
AAAI 2026
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
AAAI 2026
T3Time: Tri-Modal Time Series Forecasting via Adaptive Multi-Head Alignment and Residual Fusion
AAAI 2026
Efficient Multimodal Large Language Model via Dynamic KV Cache Quantization
AAAI 2026
MARE: Multimodal Analogical Reasoning for Disease Evolution-Aware Radiology Report Generation
AAAI 2026
DAVID: Dual-stage Adaptive Vision-text Integrated Decoupling for Multimodal KV Cache Eviction
AAAI 2026
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models
AAAI 2026
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models
AAAI 2026
TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing
AAAI 2026
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
AAAI 2026
Who Should I Trust? Explicit Confidence-Focused Multimodal Intent Recognition
AAAI 2026
STEP-Nav: Spatial-Temporal Efficient Visual Token Pruning for Vision-and-Language Navigation with Large Language Models
AAAI 2026
MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
AAAI 2026
Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
AAAI 2026
<
1
…
37
38
39
…
523
>