Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
BQA: Body Language Question Answering Dataset for Video Large Language Models
ACL 2025
Grounded, or a Good Guesser? A Per-Question Balanced Dataset to Separate Blind from Grounded Models for Embodied Question Answering
ACL 2025
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
ACL 2025
Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
ACL 2025
EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs
ACL 2025
Detecting and Mitigating Challenges in Zero-Shot Video Summarization with Video LLMs
ACL 2025
GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models
ACL 2025
Harnessing PDF Data for Improving Japanese Large Multimodal Models
ACL 2025
Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning
ACL 2025
VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
ACL 2025
MANBench: Is Your Multimodal Model Smarter than Human?
ACL 2025
Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models
ACL 2025
SignAlignLM: Integrating Multimodal Sign Language Processing into Large Language Models
ACL 2025
MM-R3: On (In-)Consistency of Vision-Language Models (VLMs)
ACL 2025
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
ACL 2025
Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities
ACL 2025
VCD: A Dataset for Visual Commonsense Discovery in Images
ACL 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
ACL 2025
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
ACL 2025
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
ACL 2025
EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations
ACL 2025
Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation
ACL 2025
Testing Spatial Intuitions of Humans and Large Language and Multimodal Models in Analogies
ACL 2025
Strengths and Limitations of Word-Based Task Explainability in Vision Language Models: a Case Study on Biological Sex Biases in the Medical Domain
ACL 2025
DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models’ Understanding on Indian Culture
EMNLP 2025
<
1
…
7
8
9
…
28
>