conftrace_

multimodal learning

4622 papers

Explore in graph

Also known as

VLM VLLM MM VLA MLLMS MLM MML MULLM LMM MLLM MMT

Co-occurring keywords

large language model (12755) vision-language model (2235) visual question answering (1000) video understanding (1647) multi-modal learning (1276) contrastive learning (3979) representation learning (6174) transfer learning (5442) zero-shot learning (3637) vision language model (752)

Papers

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction EMNLP 2023

BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification EMNLP 2023

ALCAP: Alignment-Augmented Music Captioner EMNLP 2023

ORANGE: Text-video Retrieval via Watch-time-aware Heterogeneous Graph Contrastive Learning EMNLP 2023

An Empirical Study of Multimodal Model Merging EMNLP 2023

TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining EMNLP 2023

Semantists at ImageArg-2023: Exploring Cross-modal Contrastive and Ensemble Models for Multimodal Stance and Persuasiveness Classification EMNLP 2023

ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages EMNLP 2023

KnowComp Submission for WMT23 Sign Language Translation Task EMNLP 2023

Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining EMNLP 2023

IUST at ImageArg: The First Shared Task in Multimodal Argument Mining EMNLP 2023

Semi-supervised multimodal coreference resolution in image narrations EMNLP 2023

DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models EMNLP 2023

Natural Disaster Tweets Classification Using Multimodal Data EMNLP 2023

IC3: Image Captioning by Committee Consensus EMNLP 2023

GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations EMNLP 2023

ChatEdit: Towards Multi-turn Interactive Facial Image Editing via Dialogue EMNLP 2023

FACTIFY3M: A benchmark for multimodal fact verification with explainability through 5W Question-Answering EMNLP 2023

Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines EMNLP 2023

VKIE: The Application of Key Information Extraction on Video Text EMNLP 2023

ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories EMNLP 2023

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions EMNLP 2023

Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path EMNLP 2023

Text-guided 3D Human Generation from 2D Collections EMNLP 2023

Controllable Chest X-Ray Report Generation from Longitudinal Representations EMNLP 2023