Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

Visual Storytelling with Question-Answer Plans EMNLP 2023

An Empirical Study of Frame Selection for Text-to-Video Retrieval EMNLP 2023

VIPHY: Probing “Visible” Physical Commonsense Knowledge EMNLP 2023

Hierarchical Fusion for Online Multimodal Dialog Act Classification EMNLP 2023

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations EMNLP 2023

Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment EMNLP 2023

Unifying Text, Tables, and Images for Multimodal Question Answering EMNLP 2023

Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamic Audio-Visual Scenarios EMNLP 2023

Learning to Follow Object-Centric Image Editing Instructions Faithfully EMNLP 2023

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models EMNLP 2023

Scaling Vision-Language Models with Sparse Mixture of Experts EMNLP 2023

Aspect-Category Enhanced Learning with a Neural Coherence Model for Implicit Sentiment Analysis EMNLP 2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning EMNLP 2023

HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue EMNLP 2023

Scene Graph Enhanced Pseudo-Labeling for Referring Expression Comprehension EMNLP 2023

Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts EMNLP 2023

MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition EMNLP 2023

PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning EMNLP 2023

Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches EMNLP 2023

IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning EMNLP 2023

Sound of Story: Multi-modal Storytelling with Audio EMNLP 2023

Improving Multimodal Sentiment Analysis: Supervised Angular margin-based Contrastive Learning for Enhanced Fusion Representation EMNLP 2023

ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding EMNLP 2023

Does Listener Gaze in Face-to-Face Interaction Follow the Entropy Rate Constancy Principle: An Empirical Study EMNLP 2023

Incorporating Object-Level Visual Context for Multimodal Fine-Grained Entity Typing EMNLP 2023