Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Models
Deep Learning
›
Models
›
Vision-Language Models
685 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 3
2018: 1
2019: 7
2020: 12
2021: 26
2022: 57
2023: 94
2024: 235
2025: 248
Papers
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
CVPR 2025
Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking
AAAI 2025
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
CVPR 2025
VLA-Mark: A cross modal watermark for large vision-language alignment models
EMNLP 2025
Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation
EMNLP 2025
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
AAAI 2025
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
CVPR 2025
Multilingual Pretraining for Pixel Language Models
EMNLP 2025
Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study
EMNLP 2025
Enhance Vision-Language Alignment with Noise
AAAI 2025
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
CVPR 2025
ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom
EMNLP 2025
VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions
EMNLP 2025
MoLE:Decoding by Mixture of Layer Experts Alleviates Hallucination in Large Vision-Language Models
AAAI 2025
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
CVPR 2025
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
CVPR 2025
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR
CVPR 2025
KPL: Training-Free Medical Knowledge Mining of Vision-Language Models
AAAI 2025
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
CVPR 2025
Functionality Understanding and Segmentation in 3D Scenes
CVPR 2025
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
AAAI 2025
Dual Semantic Guidance for Open Vocabulary Semantic Segmentation
CVPR 2025
Exploring Spatial Schema Intuitions in Large Language and Vision Models
ACL 2024
Multi-modal Concept Alignment Pre-training for Generative Medical Visual Question Answering
ACL 2024
<
1
…
9
10
11
…
28
>