Papers
HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity
Yida Wang, Xueyang Zhang, Kun Zhan et al.
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving
Hao Zhou, Zhanning Gao, Zhili Chen et al.
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder
Yingqi Tang, Zhuoran Xu, Zhaotie Meng et al.
Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image
Shuang Xu, Zixiang Zhao, Haowen Bai et al.
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao, Ruibing Hou, Zejie Tian et al.
HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation
Qinqian Lei, Bo Wang, Robby T. Tan
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao et al.
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon, Minjong Lee, Sangdon Park et al.
HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
Yu Wang, Bo Dang, Wanchun Li et al.
HOMO-Feature: Cross-Arbitrary-Modal Image Matching with Homomorphism of Organized Major Orientation
Chenzhong Gao, Wei Li, Desheng Weng
HORT: Monocular Hand-held Objects Reconstruction with Transformers
Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen et al.
HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models
Yiwen Chen, Hieu T. Nguyen, Vikram Voleti et al.
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen, Marc Pollefeys, Daniel Barath et al.
How Can Objects Help Video-Language Understanding?
Zitian Tang, Shijie Wang, Junho Cho et al.
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Ziyue Wang, Yurui Dong, Fuwen Luo et al.
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Yujian Lee, Peng Gao, Yongqi Xu et al.
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui Chang, Jiahui Liu, Zhengzhe Liu et al.
How To Make Your Cell Tracker Say "I dunno!"
Richard D. Paul, Johannes Seiffarth, David Rügamer et al.
How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes
Mahnoor Fatima Saad, Ziad Al-Halah
HPSv3: Towards Wide-Spectrum Human Preference Score
Yuhang Ma, Xiaoshi Wu, Keqiang Sun et al.
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
Zhixiang Wei, Guangting Wang, Xiaoxiao Ma et al.
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.
HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes
Mai Su, Zhongtao Wang, Huishan Au et al.
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Christopher Xie, Armen Avetisyan, Henry Howard-Jenkins et al.
Human-Object Interaction from Human-Level Instructions
Zhen Wu, Jiaman Li, Pei Xu et al.