Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Comprehending and Ordering Semantics for Image Captioning
CVPR 2022
Learning Program Representations for Food Images and Cooking Recipes
CVPR 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering
CVPR 2022
ScanQA: 3D Question Answering for Spatial Scene Understanding
CVPR 2022
Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information
CVPR 2022
Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase
CVPR 2022
Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition
CVPR 2022
Vision-Language Pre-Training for Boosting Scene Text Detectors
CVPR 2022
Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry
CVPR 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
CVPR 2022
Balanced Multimodal Learning via On-the-Fly Gradient Modulation
CVPR 2022
FLAVA: A Foundational Language and Vision Alignment Model
CVPR 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
CVPR 2022
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
CVPR 2022
Towards Implicit Text-Guided 3D Shape Generation
CVPR 2022
Cross Modal Retrieval With Querybank Normalisation
CVPR 2022
Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
CVPR 2022
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
CVPR 2022
Sub-Word Level Lip Reading With Visual Attention
CVPR 2022
LiT: Zero-Shot Transfer With Locked-Image Text Tuning
CVPR 2022
End-to-End Generative Pretraining for Multimodal Video Captioning
CVPR 2022
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
CVPR 2022
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
INTERSPEECH 2022
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
INTERSPEECH 2022
Speaker recognition-assisted robust audio deepfake detection
INTERSPEECH 2022
<
1
…
87
88
89
…
128
>