Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding
AAAI 2024
Multilingual Diversity Improves Vision-Language Representations
NIPS 2024
Bi-directional Adapter for Multimodal Tracking
AAAI 2024
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
NIPS 2024
FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval
AAAI 2024
CoVR: Learning Composed Video Retrieval from Web Video Captions
AAAI 2024
SparseGNV: Generating Novel Views of Indoor Scenes with Sparse RGB-D Images
AAAI 2024
Audio Generation with Multiple Conditional Diffusion Model
AAAI 2024
Exploiting Polarized Material Cues for Robust Car Detection
AAAI 2024
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
NIPS 2024
WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games
NIPS 2024
Semantic-Aware Data Augmentation for Text-to-Image Synthesis
AAAI 2024
Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis
AAAI 2024
HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model
NIPS 2024
Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language
AAAI 2024
Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection
AAAI 2024
Dual-Prior Augmented Decoding Network for Long Tail Distribution in HOI Detection
AAAI 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
NIPS 2024
Generative-Based Fusion Mechanism for Multi-Modal Tracking
AAAI 2024
AFBench: A Large-scale Benchmark for Airfoil Design
NIPS 2024
Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning
NIPS 2024
RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
NIPS 2024
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
NIPS 2024
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
AAAI 2024
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
AAAI 2024
<
1
…
53
54
55
…
128
>