Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
LiT: Unifying LiDAR "Languages" with LiDAR Translator
NIPS 2024
CoVR: Learning Composed Video Retrieval from Web Video Captions
AAAI 2024
Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception
AAAI 2024
CALVIN: Improved Contextual Video Captioning via Instruction Tuning
NIPS 2024
Extending Multi-modal Contrastive Representations
NIPS 2024
Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling
WACV 2024
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
AAAI 2024
Multilingual Diversity Improves Vision-Language Representations
NIPS 2024
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
NIPS 2024
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
AAAI 2024
Chain of Generation: Multi-Modal Gesture Synthesis via Cascaded Conditional Control
AAAI 2024
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
NIPS 2024
Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning
AAAI 2024
WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games
NIPS 2024
Motion Deblurring via Spatial-Temporal Collaboration of Frames and Events
AAAI 2024
Exploiting Auxiliary Caption for Video Grounding
AAAI 2024
HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model
NIPS 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
NIPS 2024
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
AAAI 2024
Object Attribute Matters in Visual Question Answering
AAAI 2024
STDiff: Spatio-Temporal Diffusion for Continuous Stochastic Video Prediction
AAAI 2024
CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model
AAAI 2024
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models
AAAI 2024
AFBench: A Large-scale Benchmark for Airfoil Design
NIPS 2024
Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning
NIPS 2024
<
1
…
55
56
57
…
128
>