scene understanding

825 papers

Explore in graph

Co-occurring keywords

semantic segmentation (3179) 3d reconstruction (2253) 3d vision (1098) object detection (2759) depth estimation (1540) autonomous driving (1142) point cloud (1479) multimodal learning (4622) vision-language model (2235) 3d scene understanding (177)

Papers

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction CVPR 2024

Synergistic Global-space Camera and Human Reconstruction from Videos CVPR 2024

Improving Vision-and-Language Reasoning via Spatial Relations Modeling WACV 2024

Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations CVPR 2024

Towards Learning a Generalist Model for Embodied Navigation CVPR 2024

HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment CVPR 2024

360+x: A Panoptic Multi-modal Scene Understanding Dataset CVPR 2024

AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings CVPR 2024

DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation CVPR 2024

Revisiting Single Image Reflection Removal In the Wild CVPR 2024

3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces CVPR 2024

When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach CVPR 2024

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups CVPR 2024

A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals CVPR 2024

Multimodal Sense-Informed Forecasting of 3D Human Motions CVPR 2024

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning CVPR 2024

Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency CVPR 2024

Single-View Scene Point Cloud Human Grasp Generation CVPR 2024

F3Loc: Fusion and Filtering for Floorplan Localization CVPR 2024

U3DS3: Unsupervised 3D Semantic Scene Segmentation WACV 2024

Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception CVPR 2024

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance CVPR 2024

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting CVPR 2024

TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding WACV 2024

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries CVPR 2024