Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-On
AAAI 2025
Semantic Segmentation on Raindrop Degraded Images Using Two-Stage Dual Teacher-Student Learning
AAAI 2025
Asymmetric Hierarchical Difference-aware Interaction Network for Event-guided Motion Deblurring
AAAI 2025
End-to-End Autonomous Driving Through V2X Cooperation
AAAI 2025
MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation
AAAI 2025
Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models
ACL 2025
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
CVPR 2025
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
CVPR 2025
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
CVPR 2025
Auto Encoding Neural Process for Multi-interest Recommendation
AAAI 2025
Multi-Modal Recommendation Unlearning for Legal, Licensing, and Modality Constraints
AAAI 2025
Reverse Distribution Based Video Moment Retrieval for Effective Bias Elimination
AAAI 2025
Seeing Beyond Noise: Joint Graph Structure Evaluation and Denoising for Multimodal Recommendation
AAAI 2025
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
ACL 2025
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement
CVPR 2025
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation
AAAI 2025
MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation
AAAI 2025
Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition
AAAI 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
CVPR 2025
CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information
AAAI 2025
MindPainter: Efficient Brain-Conditioned Painting of Natural Images via Cross-Modal Self-Supervised Learning
AAAI 2025
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
CVPR 2025
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
CVPR 2025
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps
CVPR 2025
Beyond Coarse Labels: Fine-Grained Problem Augmentation and Multi-Dimensional Feedback for Emotional Support Conversation
EMNLP 2025
<
1
…
33
34
35
…
128
>