Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
EMNLP 2023
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder
EMNLP 2023
Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining
EMNLP 2023
IUST at ImageArg: The First Shared Task in Multimodal Argument Mining
EMNLP 2023
TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining
EMNLP 2023
A General Framework for Multimodal Argument Persuasiveness Classification of Tweets
EMNLP 2023
SPLIT: Stance and Persuasion Prediction with Multi-modal on Image and Textual Information
EMNLP 2023
Semantists at ImageArg-2023: Exploring Cross-modal Contrastive and Ensemble Models for Multimodal Stance and Persuasiveness Classification
EMNLP 2023
A Critical Analysis of Document Out-of-Distribution Detection
EMNLP 2023
Support or Refute: Analyzing the Stance of Evidence to Detect Out-of-Context Mis- and Disinformation
EMNLP 2023
Language Anisotropic Cross-Lingual Model Editing
ACL 2023
MultiQG-TI: Towards Question Generation from Multi-modal Sources
ACL 2023
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
ACL 2023
On the Difference of BERT-style and CLIP-style Text Encoders
ACL 2023
Pay Attention to Implicit Attribute Values: A Multi-modal Generative Framework for AVE Task
ACL 2023
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding
ACL 2023
DePlot: One-shot visual language reasoning by plot-to-table translation
ACL 2023
Spontaneous gestures encoded by hand positions improve language models: An Information-Theoretic motivated study
ACL 2023
Zero-shot Visual Question Answering with Language Model Feedback
ACL 2023
Multilingual Multi-Figurative Language Detection
ACL 2023
Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction
ACL 2023
CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
ACL 2023
Retrieval-augmented Video Encoding for Instructional Captioning
ACL 2023
Deeply Coupled Cross-Modal Prompt Learning
ACL 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
ACL 2023
<
1
…
36
37
38
…
59
>