scene understanding

825 papers

Explore in graph

Co-occurring keywords

semantic segmentation (3179) 3d reconstruction (2253) 3d vision (1098) object detection (2759) depth estimation (1540) autonomous driving (1142) point cloud (1479) multimodal learning (4622) vision-language model (2235) 3d scene understanding (177)

Papers

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models ICCV 2025

Object-aware Sound Source Localization via Audio-Visual Scene Understanding CVPR 2025

CAPSTONE: Composable Attribute‐Prompted Scene Translation for Zero‐Shot Vision–Language Reasoning EMNLP 2025

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis CVPR 2025

Vision-Language Embodiment for Monocular Depth Estimation CVPR 2025

HORP: Human-Object Relation Priors Guided HOI Detection CVPR 2025

Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions ICCV 2025

DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving ICCV 2025

Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation ICCV 2025

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining ICCV 2025

Dynamic Group Detection using VLM-augmented Temporal Groupness Graph ICCV 2025

DRAWER: Digital Reconstruction and Articulation With Environment Realism CVPR 2025

A Dataset for Semantic Segmentation in the Presence of Unknowns CVPR 2025

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space ICCV 2025

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks ICCV 2025

GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts WACV 2025

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces CVPR 2025

Scene Map-based Prompt Tuning for Navigation Instruction Generation CVPR 2025

PanSt3R: Multi-view Consistent Panoptic Segmentation ICCV 2025

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM ACL 2025

LT3SD: Latent Trees for 3D Scene Diffusion CVPR 2025

GaussRender: Learning 3D Occupancy with Gaussian Rendering ICCV 2025

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding ICCV 2025

Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users ACL 2025

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding? ICCV 2025