Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
EMNLP 2022
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
EMNLP 2022
GHAN: Graph-Based Hierarchical Aggregation Network for Text-Video Retrieval
EMNLP 2022
Towards Multi-Modal Sarcasm Detection via Hierarchical Congruity Modeling with Knowledge Enhancement
EMNLP 2022
FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering
EMNLP 2022
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue
EMNLP 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
EMNLP 2022
A Joint Learning Framework for Restaurant Survival Prediction and Explanation
EMNLP 2022
Mitigating Inconsistencies in Multimodal Sentiment Analysis under Uncertain Missing Modalities
EMNLP 2022
Distill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation
EMNLP 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
EMNLP 2022
Abstract Visual Reasoning with Tangram Shapes
EMNLP 2022
Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions
EMNLP 2022
Prompting for Multimodal Hateful Meme Classification
EMNLP 2022
Non-Parametric Domain Adaptation for End-to-End Speech Translation
EMNLP 2022
Multi-VQG: Generating Engaging Questions for Multiple Images
EMNLP 2022
OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport
NIPS 2022
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
NIPS 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
NIPS 2022
Behavior Transformers: Cloning $k$ modes with one stone
NIPS 2022
Audio-Driven Co-Speech Gesture Video Generation
NIPS 2022
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
NIPS 2022
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
NIPS 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
NIPS 2022
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
NIPS 2022
<
1
…
83
84
85
…
128
>