scene understanding

825 papers

Explore in graph

Co-occurring keywords

semantic segmentation (3179) 3d reconstruction (2253) 3d vision (1098) object detection (2759) depth estimation (1540) autonomous driving (1142) point cloud (1479) multimodal learning (4622) vision-language model (2235) 3d scene understanding (177)

Papers

Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation WACV 2025

Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset AAAI 2025

TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction ICCV 2025

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation CVPR 2025

GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts WACV 2025

Memory-Augmented Re-Completion for 3D Semantic Scene Completion AAAI 2025

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios AAAI 2025

RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS ICCV 2025

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer CVPR 2025

Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models ICCV 2025

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining ICCV 2025

Dynamic Group Detection using VLM-augmented Temporal Groupness Graph ICCV 2025

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks ICCV 2025

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis CVPR 2025

A Dataset for Semantic Segmentation in the Presence of Unknowns CVPR 2025

Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions ICCV 2025

Functionality Understanding and Segmentation in 3D Scenes CVPR 2025

OURO: A Self-Bootstrapped Framework for Enhancing Multimodal Scene Understanding ICCV 2025

Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space ICCV 2025

3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation ICCV 2025

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation CVPR 2025

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering CVPR 2025

DDS: Decoupled Dynamic Scene-Graph Generation Network WACV 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval ICCV 2025

GaussRender: Learning 3D Occupancy with Gaussian Rendering ICCV 2025