Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
CIOL at SemEval-2025 Task 11: Multilingual Pre-trained Model Fusion for Text-based Emotion Recognition
SEMEVAL 2025
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
ICCV 2025
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
CVPR 2025
FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance
ICCV 2025
Capturing the Unseen: Vision-Free Facial Motion Capture Using Inertial Measurement Units
AAAI 2025
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
ICCV 2025
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
AAAI 2025
SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis
ICCV 2025
Mixture of Multimodal Adapters for Sentiment Analysis
NAACL 2025
Boosting Multimodal Learning via Disentangled Gradient Learning
ICCV 2025
VarCMP: Adapting Cross-Modal Pre-Training Models for Video Anomaly Retrieval
AAAI 2025
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
ICCV 2025
Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding
NAACL 2025
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
ICCV 2025
Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-Learning
AAAI 2025
Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning
ICCV 2025
MatViX: Multimodal Information Extraction from Visually Rich Articles
NAACL 2025
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
ICCV 2025
Asymmetric Hierarchical Difference-aware Interaction Network for Event-guided Motion Deblurring
AAAI 2025
Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities
ICCV 2025
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding
NAACL 2025
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
ICCV 2025
Probing Relative Interaction and Dynamic Calibration in Multi-modal Entity Alignment
ACL 2025
Multi-Facet Blending for Faceted Query-by-Example Retrieval
ACL 2025
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
ACL 2025
<
1
…
23
24
25
…
128
>