Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
EMNLP 2022
An Anchor-based Relative Position Embedding Method for Cross-Modal Tasks
EMNLP 2022
Character-centric Story Visualization via Visual Planning and Token Alignment
EMNLP 2022
Evaluating and Improving Factuality in Multimodal Abstractive Summarization
EMNLP 2022
Entity-centered Cross-document Relation Extraction
EMNLP 2022
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
EMNLP 2022
SHARE: a System for Hierarchical Assistive Recipe Editing
EMNLP 2022
Visual Consensus Modeling for Video-Text Retrieval
AAAI 2022
Comprehensive Regularization in a Bi-directional Predictive Network for Video Anomaly Detection
AAAI 2022
Explore Inter-contrast between Videos via Composition for Weakly Supervised Temporal Sentence Grounding
AAAI 2022
Exploiting Fine-Grained Face Forgery Clues via Progressive Enhancement Learning
AAAI 2022
Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-identification
AAAI 2022
MuMu: Cooperative Multitask Learning-Based Guided Multimodal Fusion
AAAI 2022
DarkVisionNet: Low-Light Imaging via RGB-NIR Fusion with Deep Inconsistency Prior
AAAI 2022
Cross-Modal Object Tracking: Modality-Aware Representations and a Unified Benchmark
AAAI 2022
Action-Aware Embedding Enhancement for Image-Text Retrieval
AAAI 2022
Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking
AAAI 2022
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-training for Spatial-Aware Visual Representations
AAAI 2022
Exploring Motion and Appearance Information for Temporal Sentence Grounding
AAAI 2022
OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning
AAAI 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
AAAI 2022
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
AAAI 2022
TVT: Three-Way Vision Transformer through Multi-Modal Hypersphere Learning for Zero-Shot Sketch-Based Image Retrieval
AAAI 2022
One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning
AAAI 2022
Rethinking the Two-Stage Framework for Grounded Situation Recognition
AAAI 2022
<
1
…
93
94
95
…
128
>