Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
AAAI 2025
AoP-SAM: Automation of Prompts for Efficient Segmentation
AAAI 2025
GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
AAAI 2025
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
AAAI 2025
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
AAAI 2025
SSLFusion: Scale and Space Aligned Latent Fusion Model for Multimodal 3D Object Detection
AAAI 2025
PoseLLaVA: Pose Centric Multimodal LLM for Fine-Grained 3D Pose Manipulation
AAAI 2025
LPCG: A Self-conditional Architecture for Labeled Point Cloud Generation
AAAI 2025
L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection
AAAI 2025
Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control
AAAI 2025
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
AAAI 2025
Bridge Diffusion Model: Bridge Chinese Text-to-Image Diffusion Model with English Communities
AAAI 2025
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
AAAI 2025
DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes
AAAI 2025
Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval
AAAI 2025
Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment
AAAI 2025
CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment
AAAI 2025
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation
AAAI 2025
Promptable Representation Distribution Learning and Data Augmentation for Gigapixel Histopathology WSI Analysis
AAAI 2025
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
AAAI 2025
Overcoming Heterogeneous Data in Federated Medical Vision-Language Pre-training: A Triple-Embedding Model Selector Approach
AAAI 2025
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
AAAI 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
AAAI 2025
Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation
AAAI 2025
Text-Guided Nonverbal Enhancement Based on Modality-Invariant and -Specific Representations for Video Speaking Style Recognition
AAAI 2025
<
1
…
19
20
21
…
128
>