Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant EMNLP 2022

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training EMNLP 2022

DialogueGAT: A Graph Attention Network for Financial Risk Prediction by Modeling the Dialogues in Earnings Conference Calls EMNLP 2022

MovieUN: A Dataset for Movie Understanding and Narrating EMNLP 2022

DocFin: Multimodal Financial Prediction and Bias Mitigation using Semi-structured Documents EMNLP 2022

Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task EMNLP 2022

Named Entity and Relation Extraction with Multi-Modal Retrieval EMNLP 2022

Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation EMNLP 2022

Lexi: Self-Supervised Learning of the UI Language EMNLP 2022

A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict EMNLP 2022

Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos EMNLP 2022

Detecting Euphemisms with Literal Descriptions and Visual Imagery EMNLP 2022

Findings of the First WMT Shared Task on Sign Language Translation (WMT-SLT22) EMNLP 2022

CARETS: A Consistency And Robustness Evaluative Test Suite for VQA ACL 2022

MILIE: Modular & Iterative Multilingual Open Information Extraction ACL 2022

UniXcoder: Unified Cross-Modal Pre-training for Code Representation ACL 2022

Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas ACL 2022

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions ACL 2022

Multimodal Sarcasm Target Identification in Tweets ACL 2022

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena ACL 2022

Voxel-informed Language Grounding ACL 2022

Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge ACL 2022

Flexible Visual Grounding ACL 2022

M-SENA: An Integrated Platform for Multimodal Sentiment Analysis ACL 2022

QuickGraph: A Rapid Annotation Tool for Knowledge Graph Extraction from Technical Text ACL 2022