Papers
What You Have is What You Track: Adaptive and Robust Multimodal Tracking
Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.
When Anchors Meet Cold Diffusion: A Multi-Stage Approach to Lane Detection
Bo-Lun Huang, Zi-Xiang Ni, Feng-Kai Huang et al.
When and Where do Data Poisons Attack Textual Inversion?
Jeremy Styborski, Mingzhi Lyu, Jiayou Lu et al.
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo, Yingying Zhang, Xue Yang et al.
When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack
Hanqing Liu, Shouwei Ruan, Yao Huang et al.
When Pixel Difference Patterns Meet ViT: PiDiViT for Few-Shot Object Detection
Hongliang Zhou, Yongxiang Liu, Canyu Mo et al.
When Schrodinger Bridge Meets Real-World Image Dehazing with Unpaired Training
Yunwei Lan, Zhigao Cui, Xin Luo et al.
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Junyan Ye, Honglin Lin, Leyan Ou et al.
Where, What, Why: Towards Explainable Driver Attention Prediction
Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.
Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis
Baoyue Hu, Yang Wei, Junhao Xiao et al.
Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads
Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.
Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
Ge Zheng, Jiaye Qian, Jiajin Tang et al.
Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation
Soumyadipta Banerjee, Jiaul H. Paik, Debashis Sen
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
Zhongyu Yang, Jun Chen, Dannong Xu et al.
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo, Jie Hu, Yansong Qu et al.
WINS: Winograd Structured Pruning for Fast Winograd Convolution
Cheonjun Park, Hyun Jae Oh, Mincheol Park et al.
WIPES: Wavelet-based Visual Primitives
Wenhao Zhang, Hao Zhu, Delong Wu et al.
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu, Daniel Fu, Noah Tan et al.
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li, Hong-Xing Yu, Wei Liu et al.
WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
Yupeng Zheng, Pengxuan Yang, Zebin Xing et al.
WorldScore: A Unified Evaluation Benchmark for World Generation
Haoyi Duan, Hong-Xing Yu, Sirui Chen et al.
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
Yuci Liang, Xinheng Lyu, Wenting Chen et al.