Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Reality vs Counterfactual: Multi-World Contrastive Reinforcement Learning for Enhancing MLLM’s Theory of Mind in Egocentric Videos AAAI 2026

MASP: Multi-Aspect Guided Emotion Reasoning with Soft Prompt Tuning In Vision-Language Models AAAI 2026

Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier AAAI 2026

Voices, Faces, and Feelings: Multi-modal Emotion-Cognition Captioning for Mental Health Understanding AAAI 2026

MCIE: Multimodal LLM-Driven Complex Instruction Image Editing with Spatial Guidance AAAI 2026

UQ-Bench: A Benchmark for Evaluating Multimodal LLMs on Underwater Image Quality Assessment AAAI 2026

PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching AAAI 2026

Vision-language Incremental Learning with Dual Class-individual Memory AAAI 2026

Rethinking Video-Language Model from the Language Input Perspective AAAI 2026

Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies AAAI 2026

RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System AAAI 2026

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems AAAI 2026

From Pixels to Logic: A Perception-Reasoning Decomposition Framework for Open-World Referring Expression Comprehension AAAI 2026

Transferability of Adversarial Attacks in Video-based MLLMs: A Cross-modal Image-to-Video Approach AAAI 2026

BayesVQA: Energy-Guided Bayesian Debiasing for Language-Bias-Robust Visual Question Answering AAAI 2026

DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning AAAI 2026

JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics AAAI 2026

GranAlign: Granularity-Aware Alignment Framework for Zero-shot Video Moment Retrieval AAAI 2026

MPJudge: Towards Perceptual Assessment of Music-Induced Paintings AAAI 2026

SatireDecoder: Visual Cascaded Decoupling for Enhancing Satirical Image Comprehension AAAI 2026

See, Rank, and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection AAAI 2026

Modality and Task Adaptation for Enhanced Zero-shot Composed Image Retrieval AAAI 2026

CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models AAAI 2026

Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension AAAI 2026

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention AAAI 2026