Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation
AAAI 2024
FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval
AAAI 2024
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
CVPR 2024
LAMM: Label Alignment for Multi-Modal Prompt Learning
AAAI 2024
Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation
AAAI 2024
Joint Demosaicing and Denoising for Spike Camera
AAAI 2024
Have We Ever Encountered This Before? Retrieving Out-of-Distribution Road Obstacles From Driving Scenes
WACV 2024
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer
AAAI 2024
Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis
AAAI 2024
Annotation-Free Audio-Visual Segmentation
WACV 2024
See Say and Segment: Teaching LMMs to Overcome False Premises
CVPR 2024
Improving Audio-Visual Segmentation with Bidirectional Generation
AAAI 2024
Object Attribute Matters in Visual Question Answering
AAAI 2024
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
CVPR 2024
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
CVPR 2024
Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis
CVPR 2024
Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding
AAAI 2024
Open-Vocabulary Video Relation Extraction
AAAI 2024
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
CVPR 2024
Bi-directional Adapter for Multimodal Tracking
AAAI 2024
Cross-Constrained Progressive Inference for 3D Hand Pose Estimation with Dynamic Observer-Decision-Adjuster Networks
AAAI 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
NIPS 2024
Selectively Answering Visual Questions
ACL 2024
LiT: Unifying LiDAR "Languages" with LiDAR Translator
NIPS 2024
Stitching Segments and Sentences towards Generalization in Video-Text Pre-training
AAAI 2024
<
1
…
51
52
53
…
128
>