← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Detecting Attended Visual Targets in Video CVPR 2020

MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction CVPR 2020

ActBERT: Learning Global-Local Video-Text Representations CVPR 2020

STAViS: Spatio-Temporal AudioVisual Saliency Network CVPR 2020

Fashion Outfit Complementary Item Retrieval CVPR 2020

Multi-View Neural Human Rendering CVPR 2020

Learning to Have an Ear for Face Super-Resolution CVPR 2020

JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection CVPR 2020

More Grounded Image Captioning by Distilling Image-Text Matching Model CVPR 2020

ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes CVPR 2020

Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention CVPR 2020

Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think! EMNLP 2020

Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction EMNLP 2020

Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis EMNLP 2020

Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos EMNLP 2020

BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues EMNLP 2020

Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading EMNLP 2020

Quantifying Intimacy in Language EMNLP 2020

Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements EMNLP 2020

NwQM: A neural quality assessment framework for Wikipedia EMNLP 2020

VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles EMNLP 2020

Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation EMNLP 2020

ConceptBert: Concept-Aware Representation for Visual Question Answering EMNLP 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks EMNLP 2020

Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays EMNLP 2020