Papers
When Schrodinger Bridge Meets Real-World Image Dehazing with Unpaired Training
Yunwei Lan, Zhigao Cui, Xin Luo et al.
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Junyan Ye, Honglin Lin, Leyan Ou et al.
Where, What, Why: Towards Explainable Driver Attention Prediction
Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.
Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis
Baoyue Hu, Yang Wei, Junhao Xiao et al.
Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads
Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.
Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
Ge Zheng, Jiaye Qian, Jiajin Tang et al.
Wide2Long: Learning Lens Compression and Perspective Adjustment for Wide-Angle to Telephoto Translation
Soumyadipta Banerjee, Jiaul H. Paik, Debashis Sen
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
Zhongyu Yang, Jun Chen, Dannong Xu et al.
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo, Jie Hu, Yansong Qu et al.
WINS: Winograd Structured Pruning for Fast Winograd Convolution
Cheonjun Park, Hyun Jae Oh, Mincheol Park et al.
WIPES: Wavelet-based Visual Primitives
Wenhao Zhang, Hao Zhu, Delong Wu et al.
WIR3D: Visually-Informed and Geometry-Aware 3D Shape Abstraction
Richard Liu, Daniel Fu, Noah Tan et al.
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li, Hong-Xing Yu, Wei Liu et al.
WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
Yupeng Zheng, Pengxuan Yang, Zebin Xing et al.
WorldScore: A Unified Evaluation Benchmark for World Generation
Haoyi Duan, Hong-Xing Yu, Sirui Chen et al.
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
Yuci Liang, Xinheng Lyu, Wenting Chen et al.
X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
Weihao Yu, Yuanhao Cai, Ruyi Zha et al.
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Jian Ma, Qirong Peng, Xu Guo et al.
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen, Hongyi Xu, Guoxian Song et al.
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo, Thao Nguyen, Xun Huang et al.
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
Zeyi Sun, Ziyang Chu, Pan Zhang et al.
XTrack: Multimodal Training Boosts RGB-X Video Object Trackers
Yuedong Tan, Zongwei Wu, Yuqian Fu et al.