conftrace_

multimodal learning

4622 papers

Explore in graph

Co-occurring keywords

large language model (12755) vision-language model (2235) visual question answering (1000) video understanding (1647) multi-modal learning (1276) contrastive learning (3979) representation learning (6174) transfer learning (5442) zero-shot learning (3637) vision language model (752)

Papers

Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions CVPR 2021

SelfDoc: Self-Supervised Document Representation Learning CVPR 2021

There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge CVPR 2021

Counterfactual VQA: A Cause-Effect Look at Language Bias CVPR 2021

Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling CVPR 2021

Repetitive Activity Counting by Sight and Sound CVPR 2021

Intentonomy: A Dataset and Study Towards Human Intent Understanding CVPR 2021

ArtEmis: Affective Language for Visual Art CVPR 2021

Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation CVPR 2021

Caption Enriched Samples for Improving Hateful Memes Detection EMNLP 2021

Multimodal or Text? Retrieval or BERT? Benchmarking Classifiers for the Shared Task on Hateful Memes ACL 2021

Does language help generalization in vision models? EMNLP 2021

Finnish Dialect Identification: The Effect of Audio and Text EMNLP 2021

Automated Generation of Accurate & Fluent Medical X-ray Reports EMNLP 2021

Multi-stage Pre-training over Simplified Multimodal Pre-training Models ACL 2021

Diversity and Consistency: Exploring Visual Question-Answer Pair Generation EMNLP 2021

Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries EMNLP 2021

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos NIPS 2021

A Large-Scale Chinese Multimodal NER Dataset with Speech Clues ACL 2021

LT3 at SemEval-2021 Task 6: Using Multi-Modal Compact Bilinear Pooling to Combine Visual and Textual Understanding in Memes ACL 2021

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media EMNLP 2021

Learning grounded word meaning representations on similarity graphs EMNLP 2021

Recognizing Multimodal Entailment ACL 2021

Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data ACL 2021

Competence-based Multimodal Curriculum Learning for Medical Report Generation ACL 2021