Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment
ACL 2024
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
ACL 2024
Motion Deblurring via Spatial-Temporal Collaboration of Frames and Events
AAAI 2024
STDiff: Spatio-Temporal Diffusion for Continuous Stochastic Video Prediction
AAAI 2024
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
CVPR 2024
CLIM: Contrastive Language-Image Mosaic for Region Representation
AAAI 2024
CaMML: Context-Aware Multimodal Learner for Large Models
ACL 2024
Direction-Aware Video Demoiréing with Temporal-Guided Bilateral Learning
AAAI 2024
Deep Visual-Genetic Biometrics for Taxonomic Classification of Rare Species
WACV 2024
Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources
ACL 2024
Segment beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
AAAI 2024
Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control
AAAI 2024
Learning to Segment Referred Objects from Narrated Egocentric Videos
CVPR 2024
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
AAAI 2024
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
ACL 2024
Heterogeneous Test-Time Training for Multi-Modal Person Re-identification
AAAI 2024
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
CVPR 2024
Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models
ACL 2024
Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations
NAACL 2024
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
AAAI 2024
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
CVPR 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024
Open-Vocabulary Video Relation Extraction
AAAI 2024
Exploring Chain-of-Thought for Multi-modal Metaphor Detection
ACL 2024
CoVR: Learning Composed Video Retrieval from Web Video Captions
AAAI 2024
<
1
…
49
50
51
…
128
>