Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
The Multimodal Universe: Enabling Large-Scale Machine Learning with 100 TB of Astronomical Scientific Data
NIPS 2024
EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection
NIPS 2024
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization
NIPS 2024
HEST-1k: A Dataset For Spatial Transcriptomics and Histology Image Analysis
NIPS 2024
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
NIPS 2024
Calibrated Self-Rewarding Vision Language Models
NIPS 2024
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
NIPS 2024
PLIP: Language-Image Pre-training for Person Representation Learning
NIPS 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
NIPS 2024
Physics-Regularized Multi-Modal Image Assimilation for Brain Tumor Localization
NIPS 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
NIPS 2024
An eye for an ear: zero-shot audio description leveraging an image captioner with audio-visual token distribution matching
NIPS 2024
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
NIPS 2024
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
NIPS 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
NIPS 2024
CLIP in Mirror: Disentangling text from visual images through reflection
NIPS 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
NIPS 2024
DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
WACV 2024
MolTC: Towards Molecular Relational Modeling In Language Models
ACL 2024
Open-World Human-Object Interaction Detection via Multi-modal Prompts
CVPR 2024
Unified Generative and Discriminative Training for Multi-modal Large Language Models
NIPS 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
NIPS 2024
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
NIPS 2024
Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild
NIPS 2024
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
NIPS 2024
<
1
…
58
59
60
…
128
>