conftrace_

multimodal learning

4645 papers

Explore in graph

Co-occurring keywords

large language model (13587) vision-language model (2348) visual question answering (1017) video understanding (1658) multi-modal learning (1278) contrastive learning (4032) representation learning (6206) transfer learning (5449) zero-shot learning (3650) vision language model (767)

Papers

RealImpact: A Dataset of Impact Sound Fields for Real Objects CVPR 2023

S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning CVPR 2023

Affection: Learning Affective Explanations for Real-World Visual Data CVPR 2023

Improving Zero-Shot Generalization and Robustness of Multi-Modal Models CVPR 2023

You Can Ground Earlier Than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos CVPR 2023

Fine-Grained Audible Video Description CVPR 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata CVPR 2023

Decoupled Multimodal Distilling for Emotion Recognition CVPR 2023

SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation CVPR 2023

iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition CVPR 2023

AutoAD: Movie Description in Context CVPR 2023

Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Space CVPR 2023

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning CVPR 2023

Image Manipulation via Multi-Hop Instructions - A New Dataset and Weakly-Supervised Neuro-Symbolic Approach EMNLP 2023

Predict and Use: Harnessing Predicted Gaze to Improve Multimodal Sarcasm Detection EMNLP 2023

Learning the Visualness of Text Using Large Vision-Language Models EMNLP 2023

Analyzing Modular Approaches for Visual Question Decomposition EMNLP 2023

A Framework for Vision-Language Warm-up Tasks in Multimodal Dialogue Models EMNLP 2023

Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining EMNLP 2023

Support or Refute: Analyzing the Stance of Evidence to Detect Out-of-Context Mis- and Disinformation EMNLP 2023

Hallucination Detection for Grounded Instruction Generation EMNLP 2023

Debiasing Multimodal Models via Causal Information Minimization EMNLP 2023

Retrieving Multimodal Information for Augmented Generation: A Survey EMNLP 2023

Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection EMNLP 2023

Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation EMNLP 2023