Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
Defining and Evaluating Visual Language Models’ Basic Spatial Abilities: A Perspective from Psychometrics
ACL 2025
SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation
ACL 2025
Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging
ACL 2025
CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention
ACL 2025
Can MLLMs Understand the Deep Implication Behind Chinese Images?
ACL 2025
EAGLE: Expert-Guided Self-Enhancement for Preference Alignment in Pathology Large Vision-Language Model
ACL 2025
Can Vision-Language Models Evaluate Handwritten Math?
ACL 2025
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
ACL 2025
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
ACL 2025
V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
ACL 2025
ChartLens: Fine-grained Visual Attribution in Charts
ACL 2025
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
ACL 2025
Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?
ACL 2025
I0T: Embedding Standardization Method Towards Zero Modality Gap
ACL 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
ACL 2025
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
ACL 2025
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
ACL 2025
METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling
ACL 2025
VISA: Retrieval Augmented Generation with Visual Source Attribution
ACL 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
ACL 2025
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
ACL 2025
A Parameter-Efficient and Fine-Grained Prompt Learning for Vision-Language Models
ACL 2025
CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG
ACL 2025
Scalable Vision Language Model Training via High Quality Data Curation
ACL 2025
What's in the Image? A Deep-Dive into the Vision of Vision Language Models
CVPR 2025
<
1
…
6
7
8
…
28
>