Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
CVPR 2024
Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching
AAAI 2024
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
CVPR 2024
Semantic-Aware Video Representation for Few-Shot Action Recognition
WACV 2024
When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach
CVPR 2024
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
AAAI 2024
Prompt-Driven Referring Image Segmentation with Instance Contrasting
CVPR 2024
Have We Ever Encountered This Before? Retrieving Out-of-Distribution Road Obstacles From Driving Scenes
WACV 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
NIPS 2024
LiT: Unifying LiDAR "Languages" with LiDAR Translator
NIPS 2024
CALVIN: Improved Contextual Video Captioning via Instruction Tuning
NIPS 2024
Extending Multi-modal Contrastive Representations
NIPS 2024
Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling
WACV 2024
Enhancing Neural Radiance Fields with Adaptive Multi-Exposure Fusion: A Bilevel Optimization Approach for Novel View Synthesis
AAAI 2024
Multilingual Diversity Improves Vision-Language Representations
NIPS 2024
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
NIPS 2024
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
NIPS 2024
CoPL: Contextual Prompt Learning for Vision-Language Understanding
AAAI 2024
WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games
NIPS 2024
HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model
NIPS 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
NIPS 2024
Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
AAAI 2024
AFBench: A Large-scale Benchmark for Airfoil Design
NIPS 2024
Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning
NIPS 2024
RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
NIPS 2024
<
1
…
56
57
58
…
128
>