Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
IMENet: Joint 3D Semantic Scene Completion and 2D Semantic Segmentation through Iterative Mutual Enhancement
IJCAI 2021
On Pursuit of Designing Multi-modal Transformer for Video Grounding
EMNLP 2021
Towards Accurate Text-Based Image Captioning With Content Diversity Exploration
CVPR 2021
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces From Video Using Pose and Lighting Normalization
CVPR 2021
Intentonomy: A Dataset and Study Towards Human Intent Understanding
CVPR 2021
Deep RGB-D Saliency Detection With Depth-Sensitive Attention and Automatic Multi-Modal Fusion
CVPR 2021
GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition
CVPR 2021
Learning the Best Pooling Strategy for Visual Semantic Embedding
CVPR 2021
TransFill: Reference-Guided Image Inpainting by Merging Multiple Color and Spatial Transformations
CVPR 2021
Probabilistic Embeddings for Cross-Modal Retrieval
CVPR 2021
Reconsidering Representation Alignment for Multi-View Clustering
CVPR 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
CVPR 2021
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning Over Traffic Events
CVPR 2021
Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
CVPR 2021
Connecting What To Say With Where To Look by Modeling Human Attention Traces
CVPR 2021
Learning To Segment Actions From Visual and Language Instructions via Differentiable Weak Sequence Alignment
CVPR 2021
Spatial Feature Calibration and Temporal Fusion for Effective One-Stage Video Instance Segmentation
CVPR 2021
Sketch, Ground, and Refine: Top-Down Dense Video Captioning
CVPR 2021
Improving Sign Language Translation With Monolingual Data by Sign Back-Translation
CVPR 2021
Beyond Image to Depth: Improving Depth Prediction Using Echoes
CVPR 2021
Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks?
CVPR 2021
Calibrated RGB-D Salient Object Detection
CVPR 2021
UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training
CVPR 2021
Visually Informed Binaural Audio Generation without Binaural Audios
CVPR 2021
Generative Context Pair Selection for Multi-hop Question Answering
EMNLP 2021
<
1
…
101
102
103
…
128
>