← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification ICCV 2025

Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement ICCV 2025

Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation ICCV 2025

Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process ICCV 2025

Steering Guidance for Personalized Text-to-Image Diffusion Models ICCV 2025

ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction ICCV 2025

Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions ICCV 2025

Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge ICCV 2025

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection ICCV 2025

Scaling Language-Free Visual Representation Learning ICCV 2025

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation ICCV 2025

Learning Beyond Still Frames: Scaling Vision-Language Models with Video ICCV 2025

ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving ICCV 2025

UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments ICCV 2025

Harnessing Input-Adaptive Inference for Efficient VLN ICCV 2025

ProbMED: A Probabilistic Framework for Medical Multimodal Binding ICCV 2025

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization AAAI 2025

Temporally Streaming Audio-Visual Synchronization for Real-World Videos WACV 2025

CryoDomain: Sequence-free Protein Domain Identification from Low-resolution Cryo-EM Density Maps AAAI 2025

Multi-modal Deepfake Detection via Multi-task Audio-Visual Prompt Learning AAAI 2025

Multi-View Incremental Learning with Structured Hebbian Plasticity for Enhanced Fusion Efficiency AAAI 2025

AIDE: Improving 3D Open-Vocabulary Semantic Segmentation by Aligned Vision-Language Learning WACV 2025

Multimodal Fine-Grained Apparent Personality Trait Recognition: Joint Modeling of Big Five and Questionnaire Item-level Scores AAAI 2025

ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects AAAI 2025

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents CVPR 2025