Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation
AAAI 2026
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
AAAI 2026
Are We on the Right Way to Assess Document Retrieval-Augmented Generation?
AAAI 2026
SafetyReminder: Reviving Delayed Safety Awareness of Vision-Language Models to Defend Against Jailbreak Attacks
AAAI 2026
SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
AAAI 2026
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
AAAI 2026
MultiMedBench: A Scenario-Aware Benchmark for Evaluating Knowledge Editing in Medical VQA
AAAI 2026
KALL-E: Autoregressive Speech Synthesis with Next-Distribution Prediction
AAAI 2026
MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models
AAAI 2026
MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection Under Cloaking Perturbations
AAAI 2026
HeartLLM: Discretized ECG Tokenization for LLM-Based Diagnostic Reasoning
AAAI 2026
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
AAAI 2026
MARS: Multimodal Adaptive Reasoning Model for Avoiding Overthinking
AAAI 2026
Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
AAAI 2026
M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs
AAAI 2026
Failures to Surface Harmful Contents in Video Large Language Models
AAAI 2026
Efficient LLM-Jailbreaking via Multimodal-LLM Jailbreak
AAAI 2026
EchoBat: Echo-Vision Enhancement and Echo-Layered Sampling for Video LLMs Hallucination Mitigation
AAAI 2026
Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model
AAAI 2026
When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing
AAAI 2026
The Emotional Baby Is Truly Deadly: Does Your Multimodal Large Reasoning Model Have Emotional Flattery Towards Humans?
AAAI 2026
History-Aware Reasoning for GUI Agents
AAAI 2026
CoT-VLNBench: A Benchmark for Visual Chain-of-Thought Reasoning in Vision-Language-Navigation Robots
AAAI 2026
Refine and Align: Confidence Calibration Through Multi-Agent Interaction in VQA
AAAI 2026
SmartEyes: Plug-and-Play Event Detection for Retail Loss Prevention
AAAI 2026
<
1
…
39
40
41
…
523
>