← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Adaptive Transformers for Learning Multimodal Representations ACL 2020

Let Me Choose: From Verbal Context to Font Selection ACL 2020

Cross-Modality Relevance for Reasoning on Language and Vision ACL 2020

Improving Image Captioning with Better Use of Caption ACL 2020

Multimodal Neural Graph Memory Networks for Visual Question Answering ACL 2020

Aligned Dual Channel Graph Convolutional Network for Visual Question Answering ACL 2020

Words Aren’t Enough, Their Order Matters: On the Robustness of Grounding Visual Referring Expressions ACL 2020

Knowledge Supports Visual Language Grounding: A Case Study on Colour Terms ACL 2020

Cross-modal Coherence Modeling for Caption Generation ACL 2020

Amalgamation of protein sequence, structure and textual information for improving protein-protein interaction identification ACL 2020

Shaping Visual Representations with Language for Few-Shot Classification ACL 2020

Multimodal Transformer for Multimodal Machine Translation ACL 2020

Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention ACL 2020

Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer ACL 2020

A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation ACL 2020

Glyph2Vec: Learning Chinese Out-of-Vocabulary Word Embedding from Glyphs ACL 2020

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification CVPR 2020

Cross-Modal Deep Face Normals With Deactivable Skip Connections CVPR 2020

A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning CVPR 2020

TA-Student VQA: Multi-Agents Training by Self-Questioning CVPR 2020

In Defense of Grid Features for Visual Question Answering CVPR 2020

End-to-End Adversarial-Attention Network for Multi-Modal Clustering CVPR 2020

Multimodal Categorization of Crisis Events in Social Media CVPR 2020

Music Gesture for Visual Sound Separation CVPR 2020

Recognizing Objects From Any View With Object and Viewer-Centered Representations CVPR 2020