Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
From Sights to Insights: Towards Summarization of Multimodal Clinical Documents
ACL 2024
When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach
CVPR 2024
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
CVPR 2024
3D Feature Tracking via Event Camera
CVPR 2024
RGB-X Object Detection via Scene-Specific Fusion Modules
WACV 2024
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
CVPR 2024
SDMTR: A Brain-inspired Transformer for Relation Inference
AISTATS 2024
Complex Organ Mask Guided Radiology Report Generation
WACV 2024
DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning
CVPR 2024
SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection
AAAI 2024
Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control
AAAI 2024
Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception
AAAI 2024
TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation
AAAI 2024
Beyond Fusion: Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection
WACV 2024
Heterogeneous Test-Time Training for Multi-Modal Person Re-identification
AAAI 2024
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
AAAI 2024
Open-Vocabulary Video Relation Extraction
AAAI 2024
Annotation-Free Audio-Visual Segmentation
WACV 2024
CoVR: Learning Composed Video Retrieval from Web Video Captions
AAAI 2024
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
AAAI 2024
The MERSA Dataset and a Transformer-Based Approach for Speech Emotion Recognition
ACL 2024
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model
ACL 2024
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
CVPR 2024
MSU-4S - The Michigan State University Four Seasons Dataset
CVPR 2024
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
CVPR 2024
<
1
…
48
49
50
…
128
>