← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition CVPR 2022

Integrative Few-Shot Learning for Classification and Segmentation CVPR 2022

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing ACL 2022

Multimodal Dialogue Response Generation ACL 2022

Multimodal Context Carryover EMNLP 2022

Things not Written in Text: Exploring Spatial Commonsense from Visual Signals ACL 2022

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis ACL 2022

Conditioned Masked Language and Image Modeling for Image-Text Dense Retrieval EMNLP 2022

RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval EMNLP 2022

Think Beyond Words: Exploring Context-Relevant Visual Commonsense for Diverse Dialogue Generation EMNLP 2022

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts ICML 2022

Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks ICML 2022

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework ICML 2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation ICML 2022

Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization ICML 2022

LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation EMNLP 2022

CPL: Counterfactual Prompt Learning for Vision and Language Models EMNLP 2022

MCSE: Multimodal Contrastive Learning of Sentence Embeddings NAACL 2022

Dual-Channel Evidence Fusion for Fact Verification over Texts and Tables NAACL 2022

Imagination-Augmented Natural Language Understanding NAACL 2022

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling NAACL 2022

GMN: Generative Multi-modal Network for Practical Document Information Extraction NAACL 2022

Multimodal Dialogue State Tracking NAACL 2022

CapOnImage: Context-driven Dense-Captioning on Image EMNLP 2022

Concadia: Towards Image-Based Text Generation with a Purpose EMNLP 2022