Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection
CVPR 2021
Multimodal Contrastive Training for Visual Representation Learning
CVPR 2021
Structured Scene Memory for Vision-Language Navigation
CVPR 2021
Bridge To Answer: Structure-Aware Graph Interaction Network for Video Question Answering
CVPR 2021
Rich Context Aggregation With Reflection Prior for Glass Surface Detection
CVPR 2021
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
CVPR 2021
Boosting Video Representation Learning With Multi-Faceted Integration
CVPR 2021
Repetitive Activity Counting by Sight and Sound
CVPR 2021
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
CVPR 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers
CVPR 2021
CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback
CVPR 2021
HOTR: End-to-End Human-Object Interaction Detection With Transformers
CVPR 2021
POSEFusion: Pose-Guided Selective Fusion for Single-View Human Volumetric Capture
CVPR 2021
Attention Bottlenecks for Multimodal Fusion
NIPS 2021
Neural Dubber: Dubbing for Videos According to Scripts
NIPS 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
NIPS 2021
AutoGEL: An Automated Graph Neural Network with Explicit Link Information
NIPS 2021
Set Prediction in the Latent Space
NIPS 2021
Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering
NIPS 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
NIPS 2021
PolarStream: Streaming Object Detection and Segmentation with Polar Pillars
NIPS 2021
UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis
NIPS 2021
Point-of-Interest Type Prediction using Text and Images
EMNLP 2021
Finnish Dialect Identification: The Effect of Audio and Text
EMNLP 2021
Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy
EMNLP 2021
<
1
…
104
105
106
…
128
>