Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Visually-Enhanced Phrase Understanding
ACL 2023
Improved Visual Story Generation with Adaptive Context Modeling
ACL 2023
Unified Language Representation for Question Answering over Text, Tables, and Images
ACL 2023
Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark
ACL 2023
Adversarial Textual Robustness on Visual Dialog
ACL 2023
Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference Chain
ACL 2023
Evaluating pragmatic abilities of image captioners on A3DS
ACL 2023
Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement
ACL 2023
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis
ACL 2023
Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
ACL 2023
Multimodal Persona Based Generation of Comic Dialogs
ACL 2023
A Cross-Modality Context Fusion and Semantic Refinement Network for Emotion Recognition in Conversation
ACL 2023
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
ACL 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
ACL 2023
Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning
ACL 2023
BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency
ACL 2023
lilGym: Natural Language Visual Reasoning with Reinforcement Learning
ACL 2023
Translation-Enhanced Multilingual Text-to-Image Generation
ACL 2023
Dynamic Regularization in UDA for Transformers in Multimodal Classification
ACL 2023
End-to-end Knowledge Retrieval with Multi-modal Queries
ACL 2023
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
ACL 2023
Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
ACL 2023
DualGATs: Dual Graph Attention Networks for Emotion Recognition in Conversations
ACL 2023
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
ACL 2023
SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams
ACL 2023
<
1
…
37
38
39
…
59
>