Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
TaxaBind: A Unified Embedding Space for Ecological Applications
WACV 2025
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
AAAI 2025
Multi-Facet Blending for Faceted Query-by-Example Retrieval
ACL 2025
A Character-Centric Creative Story Generation via Imagination
ACL 2025
Latency Robust Cooperative Perception using Asynchronous Feature Fusion
WACV 2025
ImageEval 2025: The First Arabic Image Captioning Shared Task
EMNLP 2025
GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation
EMNLP 2025
Factors Affecting Translation Quality in In-context Learning for Multilingual Medical Domain
EMNLP 2025
SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering
WACV 2025
Overcoming Heterogeneous Data in Federated Medical Vision-Language Pre-training: A Triple-Embedding Model Selector Approach
AAAI 2025
Adaptive Keyframe Sampling for Long Video Understanding
CVPR 2025
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
CVPR 2025
Deduce and Select Evidences with Language Models for Training-Free Video Goal Inference
WACV 2025
CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment
AAAI 2025
Promptable Representation Distribution Learning and Data Augmentation for Gigapixel Histopathology WSI Analysis
AAAI 2025
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation
AAAI 2025
VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos
WACV 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
AAAI 2025
MTGA: Multi-View Temporal Granularity Aligned Aggregation for Event-Based Lip-Reading
AAAI 2025
Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration
ACL 2025
VideoGameBunny: Towards Vision Assistants for Video Games
WACV 2025
CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
ACL 2025
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
ACL 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
CVPR 2025
CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video
CVPR 2025
<
1
…
14
15
16
…
128
>