Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification
ICCV 2025
Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement
ICCV 2025
Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation
ICCV 2025
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
ICCV 2025
Steering Guidance for Personalized Text-to-Image Diffusion Models
ICCV 2025
ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction
ICCV 2025
Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions
ICCV 2025
Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge
ICCV 2025
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
ICCV 2025
Scaling Language-Free Visual Representation Learning
ICCV 2025
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
ICCV 2025
Learning Beyond Still Frames: Scaling Vision-Language Models with Video
ICCV 2025
ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
ICCV 2025
UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments
ICCV 2025
Harnessing Input-Adaptive Inference for Efficient VLN
ICCV 2025
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
ICCV 2025
GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
AAAI 2025
Temporally Streaming Audio-Visual Synchronization for Real-World Videos
WACV 2025
CryoDomain: Sequence-free Protein Domain Identification from Low-resolution Cryo-EM Density Maps
AAAI 2025
Multi-modal Deepfake Detection via Multi-task Audio-Visual Prompt Learning
AAAI 2025
Multi-View Incremental Learning with Structured Hebbian Plasticity for Enhanced Fusion Efficiency
AAAI 2025
AIDE: Improving 3D Open-Vocabulary Semantic Segmentation by Aligned Vision-Language Learning
WACV 2025
Multimodal Fine-Grained Apparent Personality Trait Recognition: Joint Modeling of Big Five and Questionnaire Item-level Scores
AAAI 2025
ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects
AAAI 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
CVPR 2025
<
1
…
13
14
15
…
128
>