Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

SGP4SR: Separated-Modality Guided User Preference Learning for Multimodal Sequential Recommendation AAAI 2026

CL-DMDF: Dynamic Multimodal Data Fusion Model Based on Contrastive Learning AAAI 2026

Targeting Borderline Fraudsters: Multi-View Hypergraph Fraud Detection with LLM-Guided Contrastive Learning AAAI 2026

Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models AAAI 2026

DMGIN: How Multimodal LLMs Enhance Large Recommendation Models for Lifelong User Post-click Behaviors AAAI 2026

Structural Entropy Guided Incremental Learning for Open-World Multimodal Social Event Detection AAAI 2026

TriFusion-IDS: A Multimodal Graph-Tabular-Text Contrastive Framework for Cross-Dataset Intrusion Detection AAAI 2026

HaNa: Hardness and Noise-Aware Robust Cross-modal Retrieval AAAI 2026

Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition AAAI 2026

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment AAAI 2026

DenseBEV: Transforming BEV Grid Cells into 3D Objects WACV 2026

CLIP-IT: CLIP-based Pairing of Histology Images with Privileged Textual Information WACV 2026

Towards Unconstrained Cross-View Pose Estimation WACV 2026

Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation WACV 2026

Cross-Modal Event Encoder: Bridging Image-Text Knowledge to Event Streams WACV 2026

Dual-Domain Multimodal Hyperbolic Fusion for Cardiopulmonary Disease Diagnosis in Emergency Care WACV 2026

Training-Free Few-Shot Segmentation via Vision-Language Guided Prompting WACV 2026

Ordinal-Aware Multimodal Engagement Recognition for Collaborative Learning WACV 2026

CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning WACV 2026

Fused Similarity Measure Based Alignment with Dual-Scale Adaptive Selection for Weakly Supervised Video Anomaly Detection WACV 2026

mmWEAVER: Environment-Specific mmWave Signal Synthesis from a Photo and Activity Description WACV 2026

LASER: Lip Landmark Assisted Speaker Detection for Robustness WACV 2026

Sea-CLIP: Mining Semantic-Aware Representations for Few-Shot Anomaly Detection with CLIP WACV 2026

Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video? WACV 2026

GraphCoT-VLA: A 3D Spatial-Aware Reasoning Vision-Language-Action Model for Robotic Manipulation with Ambiguous Instructions AAAI 2026