Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
EMNLP 2022
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
EMNLP 2022
DialogueGAT: A Graph Attention Network for Financial Risk Prediction by Modeling the Dialogues in Earnings Conference Calls
EMNLP 2022
MovieUN: A Dataset for Movie Understanding and Narrating
EMNLP 2022
DocFin: Multimodal Financial Prediction and Bias Mitigation using Semi-structured Documents
EMNLP 2022
Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task
EMNLP 2022
Named Entity and Relation Extraction with Multi-Modal Retrieval
EMNLP 2022
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
EMNLP 2022
Lexi: Self-Supervised Learning of the UI Language
EMNLP 2022
A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict
EMNLP 2022
Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos
EMNLP 2022
Detecting Euphemisms with Literal Descriptions and Visual Imagery
EMNLP 2022
Findings of the First WMT Shared Task on Sign Language Translation (WMT-SLT22)
EMNLP 2022
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
ACL 2022
MILIE: Modular & Iterative Multilingual Open Information Extraction
ACL 2022
UniXcoder: Unified Cross-Modal Pre-training for Code Representation
ACL 2022
Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas
ACL 2022
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions
ACL 2022
Multimodal Sarcasm Target Identification in Tweets
ACL 2022
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
ACL 2022
Voxel-informed Language Grounding
ACL 2022
Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge
ACL 2022
Flexible Visual Grounding
ACL 2022
M-SENA: An Integrated Platform for Multimodal Sentiment Analysis
ACL 2022
QuickGraph: A Rapid Annotation Tool for Knowledge Graph Extraction from Technical Text
ACL 2022
<
1
…
44
45
46
…
59
>