Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation AAAI 2026

Bonsai: Interpretable Tree-Adaptive Grounded Reasoning AAAI 2026

Are We on the Right Way to Assess Document Retrieval-Augmented Generation? AAAI 2026

SafetyReminder: Reviving Delayed Safety Awareness of Vision-Language Models to Defend Against Jailbreak Attacks AAAI 2026

SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models AAAI 2026

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models AAAI 2026

MultiMedBench: A Scenario-Aware Benchmark for Evaluating Knowledge Editing in Medical VQA AAAI 2026

KALL-E: Autoregressive Speech Synthesis with Next-Distribution Prediction AAAI 2026

MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models AAAI 2026

MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection Under Cloaking Perturbations AAAI 2026

HeartLLM: Discretized ECG Tokenization for LLM-Based Diagnostic Reasoning AAAI 2026

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models AAAI 2026

MARS: Multimodal Adaptive Reasoning Model for Avoiding Overthinking AAAI 2026

Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding AAAI 2026

M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs AAAI 2026

Failures to Surface Harmful Contents in Video Large Language Models AAAI 2026

Efficient LLM-Jailbreaking via Multimodal-LLM Jailbreak AAAI 2026

EchoBat: Echo-Vision Enhancement and Echo-Layered Sampling for Video LLMs Hallucination Mitigation AAAI 2026

Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model AAAI 2026

When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing AAAI 2026

The Emotional Baby Is Truly Deadly: Does Your Multimodal Large Reasoning Model Have Emotional Flattery Towards Humans? AAAI 2026

History-Aware Reasoning for GUI Agents AAAI 2026

CoT-VLNBench: A Benchmark for Visual Chain-of-Thought Reasoning in Vision-Language-Navigation Robots AAAI 2026

Refine and Align: Confidence Calibration Through Multi-Agent Interaction in VQA AAAI 2026

SmartEyes: Plug-and-Play Event Detection for Retail Loss Prevention AAAI 2026