Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition
AAAI 2026
InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE
AAAI 2026
Stop Mixing Things Up! BISCUIT Teaches Vision-Language Models to Learn New Concepts from Images on the Spot
AAAI 2026
Knowledge-Enhanced Explainable Prompting for Vision-Language Models
AAAI 2026
Thinking Aesthetics Assessment of Image Color Temperature: Models, Datasets and Benchmarks
AAAI 2026
RFI: Rectified Flow Intervention for Mitigating Object Hallucination in Large Vision-Language Models
AAAI 2026
CKDA: Cross-modality Knowledge Disentanglement and Alignment for Visible-Infrared Lifelong Person Re-identification
AAAI 2026
DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control
AAAI 2026
Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection
AAAI 2026
Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs
AAAI 2026
OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding
AAAI 2026
DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos
AAAI 2026
Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval
AAAI 2026
MIRAGE: Towards AI-Generated Image Detection in the Wild
AAAI 2026
Text-Guided Gradient Refinement: Resolving Multimodal Gradient Conflicts to Boost Adversarial Attacks on Vision-Language Models
AAAI 2026
Circuit-Think: A Multimodal Reasoning Framework for Automated Circuit-to-Netlist Translation with Trajectory-Guided Reinforcement Learning
AAAI 2026
CHIMERA: Controllable High-quality Image-Mask Extraction for Reliable Diffusion-based Anomaly Synthesis
AAAI 2026
Versatile Vision-Language Model for 3D Computed Tomography
AAAI 2026
PEFT-BoA: Parameter-Efficient Fine-Tuning with Bag-of-Adapters for Multi-Modal Object Re-identification
AAAI 2026
Point Cloud Quantization Through Multimodal Prompting for 3D Understanding
AAAI 2026
MIRA: Evaluating Multimodal AI on Complex Clinical Reasoning in Interventional Radiology
AAAI 2026
DynamicEarth: How Far Are We from Open-Vocabulary Change Detection?
AAAI 2026
Points Meet Pixels: Bridging 2D Vision-Language Model and 3D Perception Gaps for Point Cloud Quality Assessment
AAAI 2026
VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
AAAI 2026
Instruction-Guided Cross-Modal Clustering for Training-Free Visual Token Pruning in Vision-Language Models
AAAI 2026
<
1
…
18
19
20
…
523
>