Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Multi-Modal Learning
1213 directly classified papers
Papers per year
2007: 2
2008: 1
2009: 1
2011: 2
2012: 5
2013: 5
2014: 1
2015: 5
2016: 8
2017: 21
2018: 42
2019: 42
2020: 69
2021: 72
2022: 149
2023: 143
2024: 258
2025: 370
2026: 17
Papers
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
ACL 2024
SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling
CVPR 2024
Object Attribute Matters in Visual Question Answering
AAAI 2024
Beyond Entities: A Large-Scale Multi-Modal Knowledge Graph with Triplet Fact Grounding
AAAI 2024
Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
ACL 2024
Exploiting Auxiliary Caption for Video Grounding
AAAI 2024
Cross-spectral Gated-RGB Stereo Depth Estimation
CVPR 2024
Language-aware Visual Semantic Distillation for Video Question Answering
CVPR 2024
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
AAAI 2024
ActionIE: Action Extraction from Scientific Literature with Programming Languages
ACL 2024
Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges
CVPR 2024
Eliciting Better Multilingual Structured Reasoning from LLMs through Code
ACL 2024
BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed
CVPR 2024
OLIVE: Object Level In-Context Visual Embeddings
ACL 2024
Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval
WACV 2024
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
CVPR 2024
Controllable Text-to-Image Synthesis for Multi-Modality MR Images
WACV 2024
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
AAAI 2024
Detection-Based Intermediate Supervision for Visual Question Answering
AAAI 2024
FELGA: Unsupervised Fragment Embedding for Fine-Grained Cross-Modal Association
WACV 2024
SURER: Structure-Adaptive Unified Graph Neural Network for Multi-View Clustering
AAAI 2024
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
WACV 2024
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
CVPR 2024
Reliable Conflictive Multi-View Learning
AAAI 2024
Context-aware Difference Distilling for Multi-change Captioning
ACL 2024
<
1
…
22
23
24
…
49
>