Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection
EMNLP 2025
LATTE: Learning to Think with Vision Specialists
EMNLP 2025
3D Part Segmentation via Geometric Aggregation of 2D Visual Features
WACV 2025
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
AAAI 2025
DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes
AAAI 2025
AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction
EMNLP 2025
Feature Design for Bridging SAM and CLIP toward Referring Image Segmentation
WACV 2025
MEATRD: Multimodal Anomalous Tissue Region Detection Enhanced with Spatial Transcriptomics
AAAI 2025
STAIR: Manipulating Collaborative and Multimodal Information for E-Commerce Recommendation
AAAI 2025
PresentAgent: Multimodal Agent for Presentation Video Generation
EMNLP 2025
Adaptive Keyframe Sampling for Long Video Understanding
CVPR 2025
Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents
EMNLP 2025
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
EMNLP 2025
Question-Aware Gaussian Experts for Audio-Visual Question Answering
CVPR 2025
AnimateAnything: Consistent and Controllable Animation for Video Generation
CVPR 2025
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation
AAAI 2025
CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment
AAAI 2025
Bridging Semantic and Modality Gaps in Zero-Shot Captioning via Retrieval from Synthetic Data
EMNLP 2025
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
CVPR 2025
Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment
AAAI 2025
Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval
AAAI 2025
MotionMap: Representing Multimodality in Human Pose Forecasting
CVPR 2025
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
CVPR 2025
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models
CVPR 2025
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
AAAI 2025
<
1
…
16
17
18
…
128
>