Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Physics-Regularized Multi-Modal Image Assimilation for Brain Tumor Localization
NIPS 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
NIPS 2024
An eye for an ear: zero-shot audio description leveraging an image captioner with audio-visual token distribution matching
NIPS 2024
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
NIPS 2024
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
NIPS 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
NIPS 2024
CLIP in Mirror: Disentangling text from visual images through reflection
NIPS 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
NIPS 2024
DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
WACV 2024
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
Unified Generative and Discriminative Training for Multi-modal Large Language Models
NIPS 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
NIPS 2024
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
NIPS 2024
Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild
NIPS 2024
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
NIPS 2024
G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training
NIPS 2024
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning
NIPS 2024
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
NIPS 2024
Unified Insights: Harnessing Multi-modal Data for Phenotype Imputation via View Decoupling
NIPS 2024
Unified Lexical Representation for Interpretable Visual-Language Alignment
NIPS 2024
FIRE: Food Image to REcipe Generation
WACV 2024
Egocentric Action Recognition by Capturing Hand-Object Contact and Object State
WACV 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
CVPR 2024
Speechworthy Instruction-tuned Language Models
EMNLP 2024
Posture-Informed Muscular Force Learning for Robust Hand Pressure Estimation
NIPS 2024
<
1
…
61
62
63
…
128
>