Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition AAAI 2026

InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE AAAI 2026

Stop Mixing Things Up! BISCUIT Teaches Vision-Language Models to Learn New Concepts from Images on the Spot AAAI 2026

Knowledge-Enhanced Explainable Prompting for Vision-Language Models AAAI 2026

Thinking Aesthetics Assessment of Image Color Temperature: Models, Datasets and Benchmarks AAAI 2026

RFI: Rectified Flow Intervention for Mitigating Object Hallucination in Large Vision-Language Models AAAI 2026

CKDA: Cross-modality Knowledge Disentanglement and Alignment for Visible-Infrared Lifelong Person Re-identification AAAI 2026

DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control AAAI 2026

Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection AAAI 2026

Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs AAAI 2026

OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding AAAI 2026

DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos AAAI 2026

Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval AAAI 2026

MIRAGE: Towards AI-Generated Image Detection in the Wild AAAI 2026

Text-Guided Gradient Refinement: Resolving Multimodal Gradient Conflicts to Boost Adversarial Attacks on Vision-Language Models AAAI 2026

Circuit-Think: A Multimodal Reasoning Framework for Automated Circuit-to-Netlist Translation with Trajectory-Guided Reinforcement Learning AAAI 2026

CHIMERA: Controllable High-quality Image-Mask Extraction for Reliable Diffusion-based Anomaly Synthesis AAAI 2026

Versatile Vision-Language Model for 3D Computed Tomography AAAI 2026

PEFT-BoA: Parameter-Efficient Fine-Tuning with Bag-of-Adapters for Multi-Modal Object Re-identification AAAI 2026

Point Cloud Quantization Through Multimodal Prompting for 3D Understanding AAAI 2026

MIRA: Evaluating Multimodal AI on Complex Clinical Reasoning in Interventional Radiology AAAI 2026

DynamicEarth: How Far Are We from Open-Vocabulary Change Detection? AAAI 2026

Points Meet Pixels: Bridging 2D Vision-Language Model and 3D Perception Gaps for Point Cloud Quality Assessment AAAI 2026

VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction AAAI 2026

Instruction-Guided Cross-Modal Clustering for Training-Free Visual Token Pruning in Vision-Language Models AAAI 2026