Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Stable Diffusion Models are Secretly Good at Visual In-Context Learning
ICCV 2025
CUET-NLP_MP@DravidianLangTech 2025: A Transformer-Based Approach for Bridging Text and Vision in Misogyny Meme Detection in Dravidian Languages
NAACL 2025
Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification
ICCV 2025
Factors Affecting Translation Quality in In-context Learning for Multilingual Medical Domain
EMNLP 2025
Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement
ICCV 2025
Exploring Multimodal Language Models for Sustainability Disclosure Extraction: A Comparative Study
NAACL 2025
Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation
ICCV 2025
S²MILE: Semantic-and-Structure-Aware Music-Driven Lyric Generation
AAAI 2025
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
ICCV 2025
Caption Generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models
NAACL 2025
Steering Guidance for Personalized Text-to-Image Diffusion Models
ICCV 2025
Attention Bootstrapping for Multi-Modal Test-Time Adaptation
AAAI 2025
ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction
ICCV 2025
Cross-Modal Learning for Music-to-Music-Video Description Generation
NAACL 2025
Clink! Chop! Thud! - Learning Object Sounds from Real-World Interactions
ICCV 2025
Deep Submodular Optimization and LLM for Multimodal Content Extraction and Automatic Poster Generation from Long Document
AAAI 2025
Towards Accurate and Efficient 3D Object Detection for Autonomous Driving: A Mixture of Experts Computing System on Edge
ICCV 2025
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
NAACL 2025
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
ICCV 2025
Is Your Image a Good Storyteller?
AAAI 2025
Scaling Language-Free Visual Representation Learning
ICCV 2025
Survival Prediction in Lung Cancer through Multi-Modal Representation Learning
WACV 2025
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
ICCV 2025
ViDove: A Translation Agent System with Multimodal Context and Memory-Augmented Reasoning
EMNLP 2025
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
CVPR 2025
<
1
…
26
27
28
…
128
>