Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
CVPR 2025
ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
ICCV 2025
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
CVPR 2025
UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments
ICCV 2025
Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification
WACV 2025
Harnessing Input-Adaptive Inference for Efficient VLN
ICCV 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
CVPR 2025
ProbMED: A Probabilistic Framework for Medical Multimodal Binding
ICCV 2025
Active Data Curation Effectively Distills Large-Scale Multimodal Models
CVPR 2025
Object-Shot Enhanced Grounding Network for Egocentric Video
CVPR 2025
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
CVPR 2025
Temporally Streaming Audio-Visual Synchronization for Real-World Videos
WACV 2025
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
EMNLP 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
CVPR 2025
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
CVPR 2025
Personalized Preference Fine-tuning of Diffusion Models
CVPR 2025
Text Augmented Correlation Transformer For Few-shot Classification & Segmentation
CVPR 2025
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion
CVPR 2025
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
CVPR 2025
AIDE: Improving 3D Open-Vocabulary Semantic Segmentation by Aligned Vision-Language Learning
WACV 2025
GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation
EMNLP 2025
CoMMIT: Coordinated Multimodal Instruction Tuning
EMNLP 2025
EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents
ACL 2025
CEMTM: Contextual Embedding-based Multimodal Topic Modeling
EMNLP 2025
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
CVPR 2025
<
1
…
27
28
29
…
128
>