Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale AAAI 2026

S²Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening AAAI 2026

PDE-Driven Spatiotemporal Generative Modeling for Multilead ECG Synthesis AAAI 2026

Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra AAAI 2026

SyncBrain: Exploring Brain Functional Dynamics Through Neural Oscillatory Synchronization AAAI 2026

MAUGen: A Unified Diffusion Approach for Multi-Identity Facial Expression and AU Label Generation AAAI 2026

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition AAAI 2026

InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE AAAI 2026

Stop Mixing Things Up! BISCUIT Teaches Vision-Language Models to Learn New Concepts from Images on the Spot AAAI 2026

Knowledge-Enhanced Explainable Prompting for Vision-Language Models AAAI 2026

Thinking Aesthetics Assessment of Image Color Temperature: Models, Datasets and Benchmarks AAAI 2026

RFI: Rectified Flow Intervention for Mitigating Object Hallucination in Large Vision-Language Models AAAI 2026

CKDA: Cross-modality Knowledge Disentanglement and Alignment for Visible-Infrared Lifelong Person Re-identification AAAI 2026

DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control AAAI 2026

Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection AAAI 2026

Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs AAAI 2026

OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding AAAI 2026

DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos AAAI 2026

Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval AAAI 2026

MIRAGE: Towards AI-Generated Image Detection in the Wild AAAI 2026

Text-Guided Gradient Refinement: Resolving Multimodal Gradient Conflicts to Boost Adversarial Attacks on Vision-Language Models AAAI 2026

Circuit-Think: A Multimodal Reasoning Framework for Automated Circuit-to-Netlist Translation with Trajectory-Guided Reinforcement Learning AAAI 2026

CHIMERA: Controllable High-quality Image-Mask Extraction for Reliable Diffusion-based Anomaly Synthesis AAAI 2026

Versatile Vision-Language Model for 3D Computed Tomography AAAI 2026

SurgPub-Video: A Comprehensive Surgical Video Framework for Enhanced Surgical Intelligence in Vision-Language Model AAAI 2026