visual perception

82 papers

Explore in graph

Co-occurring keywords

multimodal learning (4622) vision-language model (2235) multimodal large language model (865) benchmark evaluation (1539) depth estimation (1540) visual reasoning (479) diffusion model (3720) computer vision (735) image segmentation (962) large multimodal model (176)

Papers

Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding CVPR 2025

Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking ACL 2025

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs CVPR 2024

Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models NIPS 2024

VCoder: Versatile Vision Encoders for Multimodal Large Language Models CVPR 2024

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models CVPR 2024

OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following ACL 2024

MVP-Bench: Can Large Vision-Language Models Conduct Multi-level Visual Perception Like Humans? EMNLP 2024

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object CVPR 2024

Visual Perception by Large Language Model’s Weights NIPS 2024

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models NIPS 2024

VMamba: Visual State Space Model NIPS 2024

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective ACL 2024

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception ACL 2024

PerceptionGPT: Effectively Fusing Visual Perception into LLM CVPR 2024

Exploring CLIP for Assessing the Look and Feel of Images AAAI 2023

Learning to See Physical Properties with Active Sensing Motor Policies CORL 2023

Unleashing Text-to-Image Diffusion Models for Visual Perception ICCV 2023

Egocentric Planning for Scalable Embodied Task Achievement NIPS 2023

Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? EMNLP 2023

How hard are computer vision datasets? Calibrating dataset difficulty to viewing time NIPS 2023

Just Noticeable Visual Redundancy Forecasting: A Deep Multimodal-Driven Approach AAAI 2023

Did you see that? Exploring the role of vision in the development of consonant feature contrasts in children with cochlear implants INTERSPEECH 2023

Revisiting Weakly Supervised Pre-Training of Visual Perception Models CVPR 2022

Robustness Certification of Visual Perception Models via Camera Motion Smoothing CORL 2022