Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
CVPR 2025
Multi-subject Open-set Personalization in Video Generation
CVPR 2025
Multi-View Classification Using Hybrid Fusion and Mutual Distillation
WACV 2024
Multimodal Language Models Show Evidence of Embodied Simulation
COLING 2024
m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt
COLING 2024
PGVT: Pose-Guided Video Transformer for Fine-Grained Action Recognition
WACV 2024
M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets
COLING 2024
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
CVPR 2024
Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning
NIPS 2024
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping
NIPS 2024
Efficient Large Multi-modal Models via Visual Context Compression
NIPS 2024
Discriminative Probing and Tuning for Text-to-Image Generation
CVPR 2024
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
WACV 2024
All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation
NIPS 2024
Visually Guided Audio Source Separation With Meta Consistency Learning
WACV 2024
HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data
NIPS 2024
MUCS@LT-EDI-2024: Exploring Joint Representation for Memes Classification
EACL 2024
Deep Correlated Prompting for Visual Recognition with Missing Modalities
NIPS 2024
MasonTigers@LT-EDI-2024: An Ensemble Approach Towards Detecting Homophobia and Transphobia in Social Media Comments
EACL 2024
ReMI: A Dataset for Reasoning with Multiple Images
NIPS 2024
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
AAAI 2024
Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment
AAAI 2024
Debiasing Multimodal Sarcasm Detection with Contrastive Learning
AAAI 2024
Open-Vocabulary Video Relation Extraction
AAAI 2024
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
AAAI 2024
<
1
…
34
35
36
…
128
>