Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities EMNLP 2023

Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder EMNLP 2023

Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining EMNLP 2023

IUST at ImageArg: The First Shared Task in Multimodal Argument Mining EMNLP 2023

TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining EMNLP 2023

A General Framework for Multimodal Argument Persuasiveness Classification of Tweets EMNLP 2023

SPLIT: Stance and Persuasion Prediction with Multi-modal on Image and Textual Information EMNLP 2023

Semantists at ImageArg-2023: Exploring Cross-modal Contrastive and Ensemble Models for Multimodal Stance and Persuasiveness Classification EMNLP 2023

A Critical Analysis of Document Out-of-Distribution Detection EMNLP 2023

Support or Refute: Analyzing the Stance of Evidence to Detect Out-of-Context Mis- and Disinformation EMNLP 2023

Language Anisotropic Cross-Lingual Model Editing ACL 2023

MultiQG-TI: Towards Question Generation from Multi-modal Sources ACL 2023

Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks ACL 2023

On the Difference of BERT-style and CLIP-style Text Encoders ACL 2023

Pay Attention to Implicit Attribute Values: A Multi-modal Generative Framework for AVE Task ACL 2023

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding ACL 2023

DePlot: One-shot visual language reasoning by plot-to-table translation ACL 2023

Spontaneous gestures encoded by hand positions improve language models: An Information-Theoretic motivated study ACL 2023

Zero-shot Visual Question Answering with Language Model Feedback ACL 2023

Multilingual Multi-Figurative Language Detection ACL 2023

Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction ACL 2023

CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training ACL 2023

Retrieval-augmented Video Encoding for Instructional Captioning ACL 2023

Deeply Coupled Cross-Modal Prompt Learning ACL 2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment ACL 2023