Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

Visually-Enhanced Phrase Understanding ACL 2023

Improved Visual Story Generation with Adaptive Context Modeling ACL 2023

Unified Language Representation for Question Answering over Text, Tables, and Images ACL 2023

Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark ACL 2023

Adversarial Textual Robustness on Visual Dialog ACL 2023

Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference Chain ACL 2023

Evaluating pragmatic abilities of image captioners on A3DS ACL 2023

Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement ACL 2023

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis ACL 2023

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning ACL 2023

Multimodal Persona Based Generation of Comic Dialogs ACL 2023

A Cross-Modality Context Fusion and Semantic Refinement Network for Emotion Recognition in Conversation ACL 2023

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering ACL 2023

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning ACL 2023

Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning ACL 2023

BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency ACL 2023

lilGym: Natural Language Visual Reasoning with Reinforcement Learning ACL 2023

Translation-Enhanced Multilingual Text-to-Image Generation ACL 2023

Dynamic Regularization in UDA for Transformers in Multimodal Classification ACL 2023

End-to-end Knowledge Retrieval with Multi-modal Queries ACL 2023

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding ACL 2023

Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment ACL 2023

DualGATs: Dual Graph Attention Networks for Emotion Recognition in Conversations ACL 2023

MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation ACL 2023

SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams ACL 2023