Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

Unleashing the Potential of Large Language Models for Text-to-Image Generation Through Autoregressive Representation Alignment AAAI 2026

Identity-Aware Vision-Language Model for Explainable Face Forgery Detection AAAI 2026

Look-Back: Implicit Visual Re-focusing in MLLM Reasoning AAAI 2026

LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations AAAI 2026

When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion? AAAI 2026

Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing AAAI 2026

CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World AAAI 2026

L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention AAAI 2026

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding AAAI 2026

UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment AAAI 2026

InstructDubber: Instruction-based Alignment for Zero-shot Movie Dubbing AAAI 2026

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness AAAI 2026

OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward AAAI 2026

Collaboratively “Copy & Paste” 2D-3D Features for Complex Video-to-Video Motion Editing AAAI 2026

Mitigating Entity Hallucinations in 3D Radiology Report Generation via Dual-Stream Alignment AAAI 2026

OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model AAAI 2026

Less Is More: Vision Representation Compression for Efficient Video Generation with Large Language Models AAAI 2026

Zero-Shot Open-Vocabulary Human Motion Grounding with Test-Time Training AAAI 2026

MISF: MLLM Guided Iterative Sample Filtering for Data Fault Detection AAAI 2026

LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs AAAI 2026

Subspace-Aware Graph Construction and Contrastive Alignment for Multimodal Recommendation with Large Language Models AAAI 2026

Data-Centric Sequential Recommendation with Relation-Augmented Generation AAAI 2026

Enhancing Conversational Recommender Systems with Tree-Structured Knowledge and Pretrained Language Models AAAI 2026

TGCA-LLM: Time-Aware Graph-Text Contrastive Alignment for Enhancing LLMs in Temporal Knowledge Graph Completion AAAI 2026

Hearing More with Less: Multi-Modal Retrieval-and-Selection Augmented Conversational LLM-Based ASR AAAI 2026