Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Visual Storytelling with Question-Answer Plans
EMNLP 2023
Mulan: A Multi-Level Alignment Model for Video Question Answering
EMNLP 2023
COSMOS: Catching Out-of-Context Image Misuse Using Self-Supervised Learning
AAAI 2023
Rethinking Alignment and Uniformity in Unsupervised Image Semantic Segmentation
AAAI 2023
MVCINN: Multi-View Diabetic Retinopathy Detection Using a Deep Cross-Interaction Neural Network
AAAI 2023
CF-ViT: A General Coarse-to-Fine Method for Vision Transformer
AAAI 2023
Learning Deep Hierarchical Features with Spatial Regularization for One-Class Facial Expression Recognition
AAAI 2023
Interactive Concept Bottleneck Models
AAAI 2023
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video
AAAI 2023
Multi-Modal Knowledge Hypergraph for Diverse Image Retrieval
AAAI 2023
Video-Text Pre-training with Learned Regions for Retrieval
AAAI 2023
Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer
AAAI 2023
Cross-Modality Earth Mover’s Distance for Visible Thermal Person Re-identification
AAAI 2023
Learning Polysemantic Spoof Trace: A Multi-Modal Disentanglement Network for Face Anti-spoofing
AAAI 2023
CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention
AAAI 2023
Predict and Use: Harnessing Predicted Gaze to Improve Multimodal Sarcasm Detection
EMNLP 2023
Let’s Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
EMNLP 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
EMNLP 2023
Referring Expression Comprehension Using Language Adaptive Inference
AAAI 2023
Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
AAAI 2023
Visually Grounded Commonsense Knowledge Acquisition
AAAI 2023
Improving the Cross-Lingual Generalisation in Visual Question Answering
AAAI 2023
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
EMNLP 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
EMNLP 2023
Hallucination Detection for Grounded Instruction Generation
EMNLP 2023
<
1
…
33
34
35
…
51
>