Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data ACL 2022

Graph Neural Networks for Multiparallel Word Alignment ACL 2022

VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator ACL 2022

Combining Static and Contextualised Multilingual Embeddings ACL 2022

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation ACL 2022

Co-VQA : Answering by Interactive Sub Question Sequence ACL 2022

xGQA: Cross-Lingual Visual Question Answering ACL 2022

Selecting Stickers in Open-Domain Dialogue through Multitask Learning ACL 2022

Prior Knowledge and Memory Enriched Transformer for Sign Language Translation ACL 2022

Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer ACL 2022

Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering ACL 2022

Less Descriptive yet Discriminative: Quantifying the Properties of Multimodal Referring Utterances via CLIP ACL 2022

Poirot at CMCL 2022 Shared Task: Zero Shot Crosslingual Eye-Tracking Data Prediction using Multilingual Transformer Models ACL 2022

How does fake news use a thumbnail? CLIP-based Multimodal Detection on the Unrepresentative News Image ACL 2022

Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks ACL 2022

A Selective Summary of Where to Hide a Stolen Elephant: Leaps in Creative Writing with Multimodal Machine Intelligence ACL 2022

Multimodal Conversational AI: A Survey of Datasets and Approaches ACL 2022

Relation-aware Video Reading Comprehension for Temporal Language Grounding EMNLP 2021

Evidence Aware Neural Pornographic Text Identification for Child Protection AAAI 2021

Improving Multimodal fusion via Mutual Dependency Maximisation EMNLP 2021

Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text EMNLP 2021

Global Fusion Attention for Vision and Language Understanding (Student Abstract) AAAI 2021

Mutual-Learning Improves End-to-End Speech Translation EMNLP 2021

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization EMNLP 2021

Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments EMNLP 2021