Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare
NIPS 2024
Dense Connector for MLLMs
NIPS 2024
Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series Forecasting
NIPS 2024
Learning Spatially-Aware Language and Audio Embeddings
NIPS 2024
An eye for an ear: zero-shot audio description leveraging an image captioner with audio-visual token distribution matching
NIPS 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
NIPS 2024
Multi-Object Hallucination in Vision Language Models
NIPS 2024
PLIP: Language-Image Pre-training for Person Representation Learning
NIPS 2024
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
NIPS 2024
Grasp as You Say: Language-guided Dexterous Grasp Generation
NIPS 2024
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
NIPS 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
NIPS 2024
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
NIPS 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
NIPS 2024
Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models
NIPS 2024
Why are Visually-Grounded Language Models Bad at Image Classification?
NIPS 2024
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
NIPS 2024
G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models
NIPS 2024
ChatCam: Empowering Camera Control through Conversational AI
NIPS 2024
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
NIPS 2024
Vript: A Video Is Worth Thousands of Words
NIPS 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
NIPS 2024
Facilitating Multimodal Classification via Dynamically Learning Modality Gap
NIPS 2024
Boosting Vision-Language Models with Transduction
NIPS 2024
Terra: A Multimodal Spatio-Temporal Dataset Spanning the Earth
NIPS 2024
<
1
…
29
30
31
…
59
>