Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
NIPS 2024
LiT: Unifying LiDAR "Languages" with LiDAR Translator
NIPS 2024
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
AAAI 2024
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024
Tell Me What Is Good about This Property: Leveraging Reviews for Segment-Personalized Image Collection Summarization
AAAI 2024
CALVIN: Improved Contextual Video Captioning via Instruction Tuning
NIPS 2024
Extending Multi-modal Contrastive Representations
NIPS 2024
Data Roaming and Quality Assessment for Composed Image Retrieval
AAAI 2024
Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling
WACV 2024
Multilingual Diversity Improves Vision-Language Representations
NIPS 2024
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
NIPS 2024
Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching
AAAI 2024
Multimodal Ensembling for Zero-Shot Image Classification
AAAI 2024
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
NIPS 2024
WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games
NIPS 2024
Stitching Segments and Sentences towards Generalization in Video-Text Pre-training
AAAI 2024
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models
AAAI 2024
HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model
NIPS 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
NIPS 2024
Decouple Content and Motion for Conditional Image-to-Video Generation
AAAI 2024
TF-CLIP: Learning Text-Free CLIP for Video-Based Person Re-identification
AAAI 2024
AFBench: A Large-scale Benchmark for Airfoil Design
NIPS 2024
Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning
NIPS 2024
Generative-Based Fusion Mechanism for Multi-Modal Tracking
AAAI 2024
RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
NIPS 2024
<
1
…
54
55
56
…
128
>