Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
CPIQA: Climate Paper Image Question Answering Dataset for Retrieval-Augmented Generation with Context-based Query Expansion
ACL 2025
Experiential Semantic Information and Brain Alignment: Are Multimodal Models Better than Language Models?
ACL 2025
VideoAuteur: Towards Long Narrative Video Generation
ICCV 2025
Is CLIP ideal? No. Can we fix it? Yes!
ICCV 2025
ImageEval 2025: The First Arabic Image Captioning Shared Task
EMNLP 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
Fired_from_NLP@DravidianLangTech 2025: A Multimodal Approach for Detecting Misogynistic Content in Tamil and Malayalam Memes
NAACL 2025
Unified Open-World Segmentation with Multi-Modal Prompts
ICCV 2025
Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models
EMNLP 2025
SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking
ICCV 2025
CUET_Novice@DravidianLangTech 2025: A Multimodal Transformer-Based Approach for Detecting Misogynistic Memes in Malayalam Language
NAACL 2025
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
ICCV 2025
Semi-Supervised Multimodal Classification Through Learning from Modal and Strategic Complementarities
AAAI 2025
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
ICCV 2025
teamiic@DravidianLangTech2025-NAACL 2025: Transformer-Based Multimodal Feature Fusion for Misogynistic Meme Detection in Low-Resource Dravidian Language
NAACL 2025
Sliced Wasserstein Bridge for Open-Vocabulary Video Instance Segmentation
ICCV 2025
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
CVPR 2025
I2VControl: Disentangled and Unified Video Motion Synthesis Control
ICCV 2025
SemanticCuetSync@DravidianLangTech 2025: Multimodal Fusion for Hate Speech Detection - A Transformer Based Approach with Cross-Modal Attention
NAACL 2025
HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID
ICCV 2025
SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision
EMNLP 2025
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
ICCV 2025
One_by_zero@DravidianLangTech 2025: A Multimodal Approach for Misogyny Meme Detection in Malayalam Leveraging Visual and Textual Features
NAACL 2025
Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition
ICCV 2025
SparQLe: Speech Queries to Text Translation Through LLMs
ACL 2025
<
1
…
25
26
27
…
128
>