Computer Vision › Core AI ›

Multimodal Learning

1257 directly classified papers

Papers per year

Papers

Text-guided 3D Human Generation from 2D Collections EMNLP 2023

Sparse Black-Box Multimodal Attack for Vision-Language Adversary Generation EMNLP 2023

Revealing Single Frame Bias for Video-and-Language Learning ACL 2023

Measuring Progress in Fine-grained Vision-and-Language Understanding ACL 2023

Multi-modal Action Chain Abductive Reasoning ACL 2023

Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text ACL 2023

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering ACL 2023

XtremeCLIP: Extremely Parameter-efficient Tuning for Low-resource Vision Language Understanding ACL 2023

MultiQG-TI: Towards Question Generation from Multi-modal Sources ACL 2023

e-Health CSIRO at RadSum23: Adapting a Chest X-Ray Report Generator to Multimodal Radiology Report Summarisation ACL 2023

Incorporating Object-Level Visual Context for Multimodal Fine-Grained Entity Typing EMNLP 2023

Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow NIPS 2023

Vocabulary-free Image Classification NIPS 2023

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality NIPS 2023

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective NIPS 2023

MultiCMET: A Novel Chinese Benchmark for Understanding Multimodal Metaphor EMNLP 2023

Visually Grounded Continual Language Learning with Selective Specialization EMNLP 2023

Cross-Modal Semantic Enhanced Interaction for Image-Sentence Retrieval WACV 2023

Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data AISTATS 2023

CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension EACL 2023

Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions EACL 2023

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks EACL 2023

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models ICML 2023

Continual Vision-Language Representation Learning with Off-Diagonal Information ICML 2023

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers ICML 2023