Papers
8,506 papers found
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao, Ruibing Hou, Zejie Tian et al.
HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation
Qinqian Lei, Bo Wang, Robby T. Tan
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao et al.
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon, Minjong Lee, Sangdon Park et al.
HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
Yu Wang, Bo Dang, Wanchun Li et al.
HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation
Chenzhong Gao, Wei Li, Desheng Weng
HORT: Monocular Hand-held Objects Reconstruction with Transformers
Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen et al.
HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models
Yiwen Chen, Hieu T. Nguyen, Vikram Voleti et al.
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen, Marc Pollefeys, Daniel Barath et al.
How Can Objects Help Video-Language Understanding?
Zitian Tang, Shijie Wang, Junho Cho et al.
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Ziyue Wang, Yurui Dong, Fuwen Luo et al.
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee, Peng Gao, Yongqi Xu et al.
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui Chang, Jiahui Liu, Zhengzhe Liu et al.
How To Make Your Cell Tracker Say "I dunno!"
Richard D. Paul, Johannes Seiffarth, David Rügamer et al.
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
Mahnoor Fatima Saad, Ziad Al-Halah
HPSv3: Towards Wide-Spectrum Human Preference Score
Yuhang Ma, Xiaoshi Wu, Keqiang Sun et al.
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
Zhixiang Wei, Guangting Wang, Xiaoxiao Ma et al.
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.
HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes
Mai Su, Zhongtao Wang, Huishan Au et al.
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Christopher Xie, Armen Avetisyan, Henry Howard-Jenkins et al.
Human-Object Interaction from Human-Level Instructions
Zhen Wu, Jiaman Li, Pei Xu et al.
HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
Timo Teufel, Pulkit Gera, Xilong Zhou et al.
HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
Chang Liu, Yunfan Ye, Fan Zhang et al.
Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos
Changwoon Choi, Jeongjun Kim, Geonho Cha et al.
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
Fengyuan Yang, Kerui Gu, Ha Linh Nguyen et al.