← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Caption Enriched Samples for Improving Hateful Memes Detection EMNLP 2021

Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval EMNLP 2021

Visually Grounded Reasoning across Languages and Cultures EMNLP 2021

A Web Scale Entity Extraction System EMNLP 2021

Cross-Modal Retrieval Augmentation for Multi-Modal Classification EMNLP 2021

Generating Mammography Reports from Multi-view Mammograms with BERT EMNLP 2021

Compositional Networks Enable Systematic Generalization for Grounded Language Understanding EMNLP 2021

Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification EMNLP 2021

Entity-level Cross-modal Learning Improves Multi-modal Machine Translation EMNLP 2021

MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering EMNLP 2021

Progressive Transformer-Based Generation of Radiology Reports EMNLP 2021

Visual Cues and Error Correction for Translation Robustness EMNLP 2021

MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding EMNLP 2021

Image Retrieval for Arguments Using Stance-Aware Query Expansion EMNLP 2021

Empathetic Dialog Generation with Fine-Grained Intents EMNLP 2021

Coreference by Appearance: Visually Grounded Event Coreference Resolution EMNLP 2021

FaBULOUS: Fact-checking Based on Understanding of Language Over Unstructured and Structured information EMNLP 2021

Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models EMNLP 2021

Discriminative Multi-Modality Speech Recognition CVPR 2020

Learning Longterm Representations for Person Re-Identification Using Radio Signals CVPR 2020

Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension CVPR 2020

Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification CVPR 2020

Speech2Action: Cross-Modal Supervision for Action Recognition CVPR 2020

Violin: A Large-Scale Dataset for Video-and-Language Inference CVPR 2020

A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation CVPR 2020