conftrace_

multimodal learning

4622 papers

Explore in graph

Co-occurring keywords

large language model (12755) vision-language model (2235) visual question answering (1000) video understanding (1647) multi-modal learning (1276) contrastive learning (3979) representation learning (6174) transfer learning (5442) zero-shot learning (3637) vision language model (752)

Papers

Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis AAAI 2021

YNU-HPCC at SemEval-2021 Task 6: Combining ALBERT and Text-CNN for Persuasion Detection in Texts and Images ACL 2021

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding ACL 2021

Towards Visual Question Answering on Pathology Images ACL 2021

Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature ICCV 2021

Check It Again:Progressive Visual Question Answering via Visual Entailment ACL 2021

MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification AAAI 2021

VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization AAAI 2021

CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval AAAI 2021

SMIL: Multimodal Learning with Severely Missing Modality AAAI 2021

MultiMET: A Multimodal Dataset for Metaphor Understanding ACL 2021

Multimodal Knowledge Expansion ICCV 2021

Multimodal Item Categorization Fully Based on Transformer ACL 2021

Dual Compositional Learning in Interactive Image Retrieval AAAI 2021

Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach AAAI 2021

Audio-Visual Localization by Synthetic Acoustic Image Generation AAAI 2021

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation AAAI 2021

How to leverage the multimodal EHR data for better medical prediction? EMNLP 2021

WhyAct: Identifying Action Reasons in Lifestyle Vlogs EMNLP 2021

CLIPScore: A Reference-free Evaluation Metric for Image Captioning EMNLP 2021

Detecting Propaganda Techniques in Memes ACL 2021

VLGrammar: Grounded Grammar Induction of Vision and Language ICCV 2021

An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog EMNLP 2021

UniMF: A Unified Framework to Incorporate Multimodal Knowledge Bases intoEnd-to-End Task-Oriented Dialogue Systems IJCAI 2021

Learning Mutual Correlation in Multimodal Transformer for Speech Emotion Recognition INTERSPEECH 2021