Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Locate Then Segment: A Strong Pipeline for Referring Image Segmentation
CVPR 2021
Siamese Natural Language Tracker: Tracking by Natural Language Descriptions With Siamese Trackers
CVPR 2021
SelfDoc: Self-Supervised Document Representation Learning
CVPR 2021
VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency
CVPR 2021
Dictionary-Guided Scene Text Recognition
CVPR 2021
Look Before You Speak: Visually Contextualized Utterances
CVPR 2021
There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge
CVPR 2021
PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
CVPR 2021
Robust Audio-Visual Instance Discrimination
CVPR 2021
Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting
CVPR 2021
TextOCR: Towards Large-Scale End-to-End Reasoning for Arbitrary-Shaped Scene Text
CVPR 2021
Counterfactual VQA: A Cause-Effect Look at Language Bias
CVPR 2021
S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
CVPR 2021
Watching You: Global-Guided Reciprocal Learning for Video-Based Person Re-Identification
CVPR 2021
Deep Burst Super-Resolution
CVPR 2021
Multi-Perspective LSTM for Joint Visual Representation Learning
CVPR 2021
Cross-Modal Center Loss for 3D Cross-Modal Retrieval
CVPR 2021
Learning Cross-Modal Retrieval With Noisy Labels
CVPR 2021
Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
CVPR 2021
Discrete-Continuous Action Space Policy Gradient-Based Attention for Image-Text Matching
CVPR 2021
Embracing Uncertainty: Decoupling and De-Bias for Robust Temporal Grounding
CVPR 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
CVPR 2021
Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval
CVPR 2021
Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
CVPR 2021
Learning From the Master: Distilling Cross-Modal Advanced Knowledge for Lip Reading
CVPR 2021
<
1
…
103
104
105
…
128
>