Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Textual Supervision for Visually Grounded Spoken Language Understanding
EMNLP 2020
Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization
EMNLP 2020
Using Visual Feature Space as a Pivot Across Languages
EMNLP 2020
Question Answering with Long Multiple-Span Answers
EMNLP 2020
Beyond Language: Learning Commonsense from Images for Reasoning
EMNLP 2020
Structural and Functional Decomposition for Personality Image Captioning in a Communication Game
EMNLP 2020
Visuo-Linguistic Question Answering (VLQA) Challenge
EMNLP 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
EMNLP 2020
Language-Conditioned Feature Pyramids for Visual Selection Tasks
EMNLP 2020
Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP
EMNLP 2020
Diverse and Relevant Visual Storytelling with Scene Graph Embeddings
EMNLP 2020
Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition
EMNLP 2020
An Element-wise Visual-enhanced BiLSTM-CRF Model for Location Name Recognition
EMNLP 2020
“Did you really mean what you said?” : Sarcasm Detection in Hindi-English Code-Mixed Data using Bilingual Word Embeddings
EMNLP 2020
EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege's Principle
CVPR 2020
Universal Weighting Metric Learning for Cross-Modal Matching
CVPR 2020
PhraseCut: Language-Based Image Segmentation in the Wild
CVPR 2020
DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads
CVPR 2020
Cross-Modal Cross-Domain Moment Alignment Network for Person Search
CVPR 2020
Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
CVPR 2020
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences
CVPR 2020
On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
CVPR 2020
Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
CVPR 2020
Self-Supervised MultiModal Versatile Networks
NIPS 2020
Labelling unlabelled videos from scratch with multi-modal self-supervision
NIPS 2020
<
1
…
107
108
109
…
128
>