conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Vision-Language Models
159 papers
Papers per year
2016: 1
1
2021: 1
1
2023: 1
1
2024: 7
7
2025: 3
3
2026: 146
146
Papers
Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding
ACL 2026
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation Agents
ACL 2026
Benchmarking Deflection and Hallucination in Large Vision-Language Models
ACL 2026
MagicBench: Diagnosing Visual Agency Loss and Semantic Dependency in Multimodal LLMs
ACL 2026
Selective Test-Time Debiasing for CLIP via Reward Gating
ACL 2026
Latent Attention Denoising: A Training-Free Energy-Based Framework for Mitigating Hallucinations in Vision-Language Models
ACL 2026
Don’t Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction
ACL 2026
Think before Go: Hierarchical Reasoning for Image-goal Navigation
ACL 2026
Zero-shot Jianzi Recognition as Structured Visual Information Extraction in Open Compositional Symbolic Systems
ACL 2026
DEFT: Demystifying VLN Failures via a Unified Dual-View Explainability Framework for LLM-based Agents
ACL 2026
MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents
ACL 2026
Beyond the Panorama: Training-Free Hierarchical Perception-Reasoning for Fine-Grained Vision in MLLMs
ACL 2026
Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training
ACL 2026
MirrorCAPTCHA: Wild CAPTCHA, Wild Distribution, Wild Web-based Platform Meet Multimodal LLM Agents
ACL 2026
InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning
ACL 2026
Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality
ACL 2026
Diagnosing Spatial Consistency across Perspectives and Viewpoints in Large Vision-Language Models
ACL 2026
Evaluating Visual Narrative Coherence in Story Visualization via Diversified Storylines
ACL 2026
KG-ViP: Bridging Knowledge Grounding and Visual Perception in Multi-modal LLMs for Visual Question Answering
ACL 2026
Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
ACL 2026
DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain
ACL 2026
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG
ACL 2026
The Side Effects of Being Smart: Safety Risks in MLLMs’ Multi-Image Reasoning
ACL 2026
DiVE: Decoupling Intra-layer Visual Evidence for Mitigating Hallucinations in Large Vision-Language Models
ACL 2026
When Background Matters: Breaking Medical Vision Language Models by Transferable Attack
ACL 2026
<
1
2
3
4
5
6
7
>