Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
UIPro: Unleashing Superior Interaction Capability For GUI Agents
ICCV 2025
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
ICCV 2025
Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning
ICCV 2025
Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code
ICCV 2025
From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection
ICCV 2025
FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation
ICCV 2025
DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
ICCV 2025
Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
ICCV 2025
Doppler-Aware LiDAR-RADAR Fusion for Weather-Robust 3D Detection
ICCV 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
ICCV 2025
Unbiased Missing-modality Multimodal Learning
ICCV 2025
HERO: Human Reaction Generation from Videos
ICCV 2025
VideoAuteur: Towards Long Narrative Video Generation
ICCV 2025
Is CLIP ideal? No. Can we fix it? Yes!
ICCV 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
Unified Open-World Segmentation with Multi-Modal Prompts
ICCV 2025
SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking
ICCV 2025
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
ICCV 2025
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
ICCV 2025
Sliced Wasserstein Bridge for Open-Vocabulary Video Instance Segmentation
ICCV 2025
I2VControl: Disentangled and Unified Video Motion Synthesis Control
ICCV 2025
HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID
ICCV 2025
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
ICCV 2025
Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition
ICCV 2025
Active Data Curation Effectively Distills Large-Scale Multimodal Models
CVPR 2025
<
1
…
12
13
14
…
128
>