Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

SurgPub-Video: A Comprehensive Surgical Video Framework for Enhanced Surgical Intelligence in Vision-Language Model AAAI 2026

Dual-Phase Visual-Language Pretraining and Adaptation for Long-Tailed Multi-Label Recognition AAAI 2026

Multi-Agent Undercover Gaming: Hallucination Removal Through Counterfactual Test for Multimodal Reasoning AAAI 2026

Regression over Classification: Assessing Image Aesthetics via Multimodal Large Language Models AAAI 2026

Learning Spatial Decay for Vision Transformers AAAI 2026

Suit the Remedy to the Retriever: Interpretable Query Optimization with Retriever Preference Alignment for Vision-Language Retrieval AAAI 2026

Imagine with Layout and Sketch: Enhancing Vision-Language Retrieval with Dual-Stream Multi-Modal Query Refinement AAAI 2026

Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization AAAI 2026

Leveraging Textual Compositional Reasoning for Robust Change Captioning AAAI 2026

FineVAU: A Novel Human-Aligned Benchmark for Fine-Grained Video Anomaly Understanding AAAI 2026

What to Trust? A Trust-aware Knowledge-guided Method for Zero-shot Object State Understanding in Videos AAAI 2026

ImageSet2Text: Describing Sets of Images Through Text AAAI 2026

MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models AAAI 2026

DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities AAAI 2026

vMFCoOp: Towards Equilibrium on a Unified Hyperspherical Manifold for Prompting Biomedical VLMs AAAI 2026

SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse AAAI 2026

Video Spatial Reasoning with Object-Centric 3D Rollout AAAI 2026

Mitigating Low-Quality Reasoning in MLLMs: Self-Driven Refined Multimodal CoT with Selective Thinking and Step-wise Visual Enhancement AAAI 2026

UniMo: Unified Motion Generation and Understanding with Chain of Thought AAAI 2026

ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM AAAI 2026

TIME: Temporal-Sensitive Multi-Dimensional Instruction Tuning and Robust Benchmarking for Video-LLMs AAAI 2026

DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation AAAI 2026

PBR3DGen: A VLM-Guided Mesh Generation with High-Quality PBR Texture AAAI 2026

Remodeling Semantic Relationships in Vision-Language Fine-Tuning AAAI 2026

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study AAAI 2026