Papers
4,428 papers found
VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction
Stephane Da Silva Martins, Emanuel Aldea, Sylvie Le Hégarat-Mascle
ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models
Sibo Dong, Ismail Shaheen, Maggie Shen et al.
Visual Detector Compression via Location-Aware Discriminant Analysis
Qizhen Lan, Jung Im Choi, Qing Tian
VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models
Kailai Feng, Yabo Zhang, Haodong Yu et al.
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
Donglin Huang, Yongyuan Li, Tianhang Liu et al.
VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning
Vivek Madhavaram, Vartika Sengar, Arkadipta De et al.
VLMDiff: Leveraging Vision-Language Models for Multi-Class Anomaly Detection with Diffusion
Samet Hicsonmez, Abd El Rahman Shabayek, Djamila Aouada
VLMs Guided Interpretable Decision Making in Autonomous Driving
Xin Hu, Taotao Jing, Renran Tian et al.
VOCAL: Visual Odometry via ContrAstive Learning
Chi-Yao Huang, Zeel Bhatt, Yezhou Yang
VRAgent: Self-Refining Agent for Zero-Shot Multimodal Video Retrieval
Ketul Shah, Pankaj Nathani, Rama Chellappa et al.
WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion
Sajjad Pakdamansavoji, Yintao Ma, Amir Rasouli et al.
WarpRF: Multi-View Consistency for Training-Free Uncertainty Quantification and Applications in Radiance Fields
Sadra Safadoust, Fabio Tosi, Fatma Güney et al.
What Happens When: Learning Temporal Orders of Events in Videos
Daechul Ahn, Yura Choi, Hyeonbeom Choi et al.
Where is the Watermark? Interpretable Watermark Detection at the Block Level
Maria Bulychev, Neil G. Marchant, Benjamin I. P. Rubinstein
WiSAR3D - Aerial LiDAR Dataset for 3D Object Detection
Oren Shrout, Ori Nizan, Yizhak Ben-Shabat et al.
WiSE-OD: Benchmarking Robustness in Infrared Object Detection
Heitor R. Medeiros, Atif Belal, Masih Aminbeidokhti et al.
WorkZone3D: A Multimodal Dataset for 3D Work Zone Perception in Autonomous Driving
Shounak Sural, Nishad Sahu, Ragunathan Rajkumar
WSSSP-Net: Weakly Supervised Semantic Segmentation Plugin Network for Face Anti-Spoofing
Krzysztof Galus, Piotr Syga, Piotr Kawa
WWE-UIE: A Wavelet & White Balance Efficient Network for Underwater Image Enhancement
Ching-Heng Cheng, Jen-Wei Lee, Chia-Ming Lee et al.
X-JEPA: A Novel Joint Learning Cross-Modal Predictive Alignment Framework for Remote Sensing Image Retrieval
Shabnam Choudhury, Yash Salunkhe, Vaibhav Rajan et al.
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
Logan Lawrence, Oindrila Saha, Megan Wei et al.
ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data
Elia Bonetto, Aamir Ahmad
Zero-LEAD: Source-Free Universal Domain Adaptation for Abdominal Multi-Organ Segmentation
Ahmed El-Sayed, Marwan Torki
Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
Yan-Bo Lin, Kevin Lin, Zhengyuan Yang et al.
Zero-Shot Coreset Selection via Iterative Subspace Sampling
Brent A. Griffin, Jacob Marks, Jason J. Corso