Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
EMNLP 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
EMNLP 2024
Benchmarking Vision Language Models for Cultural Understanding
EMNLP 2024
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
EMNLP 2024
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
EMNLP 2024
Unifying Multimodal Retrieval via Document Screenshot Embedding
EMNLP 2024
Encoding and Controlling Global Semantics for Long-form Video Question Answering
EMNLP 2024
Divide and Conquer Radiology Report Generation via Observation Level Fine-grained Pretraining and Prompt Tuning
EMNLP 2024
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification
EMNLP 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
EMNLP 2024
VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image Models
EMNLP 2024
TroL: Traversal of Layers for Large Language and Vision Models
EMNLP 2024
UNICORN: A Unified Causal Video-Oriented Language-Modeling Framework for Temporal Video-Language Tasks
EMNLP 2024
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
EMNLP 2024
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
EMNLP 2024
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
EMNLP 2024
Granular Privacy Control for Geolocation with Vision Language Models
EMNLP 2024
MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
EMNLP 2024
In-Context Compositional Generalization for Large Vision-Language Models
EMNLP 2024
Game on Tree: Visual Hallucination Mitigation via Coarse-to-Fine View Tree and Game Theory
EMNLP 2024
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models
EMNLP 2024
GRIZAL: Generative Prior-guided Zero-Shot Temporal Action Localization
EMNLP 2024
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
EMNLP 2024
IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning
EMNLP 2024
Retrieval-enriched zero-shot image classification in low-resource domains
EMNLP 2024
<
1
…
15
16
17
…
28
>