Computer Vision › Analysis ›

Scene Understanding

1887 directly classified papers

Papers per year

Papers

Vision-Language Models Struggle to Align Entities across Modalities ACL 2025

Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios WACV 2025

Thermal Polarimetric Multi-view Stereo ICCV 2025

Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models ICCV 2025

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models ICCV 2025

Planar Affine Rectification from Local Change of Scale and Orientation ICCV 2025

A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition ICCV 2025

HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics CVPR 2025

Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling ICCV 2025

End-to-End Entity-Predicate Association Reasoning for Dynamic Scene Graph Generation ICCV 2025

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition ICCV 2025

UAVScenes: A Multi-Modal Dataset for UAVs ICCV 2025

Leveraging Panoptic Scene Graph for Evaluating Fine-Grained Text-to-Image Generation ICCV 2025

Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description ICCV 2025

Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens ICCV 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering ACL 2025

Scene Coordinate Reconstruction Priors ICCV 2025

Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels ICCV 2025

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference IJCNLP 2025

INTERCHART: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information IJCNLP 2025

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation CVPR 2025

The Confidence Paradox: Can LLM Know When It’s Wrong? IJCNLP 2025

FROSS: Faster-Than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images ICCV 2025

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling CVPR 2025

RayZer: A Self-supervised Large View Synthesis Model ICCV 2025