Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Visual Storytelling with Question-Answer Plans
EMNLP 2023
An Empirical Study of Frame Selection for Text-to-Video Retrieval
EMNLP 2023
VIPHY: Probing “Visible” Physical Commonsense Knowledge
EMNLP 2023
Hierarchical Fusion for Online Multimodal Dialog Act Classification
EMNLP 2023
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
EMNLP 2023
Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment
EMNLP 2023
Unifying Text, Tables, and Images for Multimodal Question Answering
EMNLP 2023
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamic Audio-Visual Scenarios
EMNLP 2023
Learning to Follow Object-Centric Image Editing Instructions Faithfully
EMNLP 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
EMNLP 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
EMNLP 2023
Aspect-Category Enhanced Learning with a Neural Coherence Model for Implicit Sentiment Analysis
EMNLP 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
EMNLP 2023
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
EMNLP 2023
Scene Graph Enhanced Pseudo-Labeling for Referring Expression Comprehension
EMNLP 2023
Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
EMNLP 2023
MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition
EMNLP 2023
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
EMNLP 2023
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
EMNLP 2023
IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning
EMNLP 2023
Sound of Story: Multi-modal Storytelling with Audio
EMNLP 2023
Improving Multimodal Sentiment Analysis: Supervised Angular margin-based Contrastive Learning for Enhanced Fusion Representation
EMNLP 2023
ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding
EMNLP 2023
Does Listener Gaze in Face-to-Face Interaction Follow the Entropy Rate Constancy Principle: An Empirical Study
EMNLP 2023
Incorporating Object-Level Visual Context for Multimodal Fine-Grained Entity Typing
EMNLP 2023
<
1
…
35
36
37
…
59
>