← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

UIPro: Unleashing Superior Interaction Capability For GUI Agents ICCV 2025

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness ICCV 2025

Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning ICCV 2025

Mixture-of-Scores: Robust Image-Text Data Valuation via Three Lines of Code ICCV 2025

From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection ICCV 2025

FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation ICCV 2025

DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions ICCV 2025

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning ICCV 2025

Doppler-Aware LiDAR-RADAR Fusion for Weather-Robust 3D Detection ICCV 2025

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents ICCV 2025

Unbiased Missing-modality Multimodal Learning ICCV 2025

HERO: Human Reaction Generation from Videos ICCV 2025

VideoAuteur: Towards Long Narrative Video Generation ICCV 2025

Is CLIP ideal? No. Can we fix it? Yes! ICCV 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework ICCV 2025

Unified Open-World Segmentation with Multi-Modal Prompts ICCV 2025

SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking ICCV 2025

PixTalk: Controlling Photorealistic Image Processing and Editing with Language ICCV 2025

Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images ICCV 2025

Sliced Wasserstein Bridge for Open-Vocabulary Video Instance Segmentation ICCV 2025

I2VControl: Disentangled and Unified Video Motion Synthesis Control ICCV 2025

HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID ICCV 2025

PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction ICCV 2025

Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition ICCV 2025

Active Data Curation Effectively Distills Large-Scale Multimodal Models CVPR 2025