Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
CVPR 2025
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
ACL 2025
Visual Evidence Prompting Mitigates Hallucinations in Large Vision-Language Models
ACL 2025
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
ACL 2025
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
ICCV 2025
MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization
ACL 2025
Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts
ACL 2025
Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
ACL 2025
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
CVPR 2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
ACL 2025
PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings
ACL 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension
ACL 2025
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
ACL 2025
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos
ACL 2025
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
ACL 2025
Walk in Others’ Shoes with a Single Glance: Human-Centric Visual Grounding with Top-View Perspective Transformation
ACL 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
ACL 2025
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
ACL 2025
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
ACL 2025
Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning
ACL 2025
Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?
ACL 2025
An Appraisal Theoretic Approach to Modelling Affect Flow in Conversation Corpora
ACL 2025
Applying the Character-Role Narrative Framework with LLMs to Investigate Environmental Narratives in Scientific Editorials and Tweets
ACL 2025
Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions
EMNLP 2025
<
1
…
14
15
16
…
59
>