Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
EMNLP 2024
VLP: Vision Language Planning for Autonomous Driving
CVPR 2024
Independency Adversarial Learning for Cross-Modal Sound Separation
AAAI 2024
Stitching Segments and Sentences towards Generalization in Video-Text Pre-training
AAAI 2024
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
CVPR 2024
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
CVPR 2024
Domain Prompt Learning with Quaternion Networks
CVPR 2024
TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation
AAAI 2024
Robust Noisy Correspondence Learning with Equivariant Similarity Consistency
CVPR 2024
Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis
CVPR 2024
Image Captioning with Multi-Context Synthetic Data
AAAI 2024
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
CVPR 2024
Prompt-Driven Referring Image Segmentation with Instance Contrasting
CVPR 2024
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
CVPR 2024
Seeing the Unseen: Visual Common Sense for Semantic Placement
CVPR 2024
Open-Vocabulary Video Relation Extraction
AAAI 2024
Segment beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
AAAI 2024
Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach
CVPR 2024
What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
CVPR 2024
L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream
CVPR 2024
Vision-and-Language Navigation via Causal Learning
CVPR 2024
Hyperbolic Learning with Synthetic Captions for Open-World Detection
CVPR 2024
Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration
CVPR 2024
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
CVPR 2024
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
CVPR 2024
<
1
…
47
48
49
…
128
>