Papers
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Guanning Zeng, Xiang Zhang, Zirui Wang et al.
YOLOE: Real-Time Seeing Anything
Ao Wang, Lihao Liu, Hui Chen et al.
You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data
Shanshan Yan, Zexi Li, Chao Wu et al.
Your Text Encoder Can Be An Object-Level Watermarking Controller
Naresh Kumar Devulapally, Mingzhen Huang, Vishal Asnani et al.
You Share Beliefs, I Adapt: Progressive Heterogeneous Collaborative Perception
Hao Si, Ehsan Javanmardi, Manabu Tsukada
You Think, You ACT: The New Task of Arbitrary Text to Motion Generation
Runqi Wang, Caoyuan Ma, Guopeng Li et al.
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.
ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models
Bingchen Gong, Diego Gomez, Abdullah Hamdi et al.
Zero-Shot Composed Image Retrieval via Dual-Stream Instruction-Aware Distillation
Wenliang Zhong, Rob Barton, Weizhi An et al.
Zero-Shot Compositional Video Learning with Coding Rate Reduction
Heeseok Jung, Jun-Hyeon Bak, Yujin Jeong et al.
Zero-Shot Depth Aware Image Editing with Diffusion Models
Rishubh Parihar, Sachidanand VS, R. Venkatesh Babu
Zero-shot Inexact CAD Model Alignment from a Single Image
Pattaramanee Arsomngern, Sasikarn Khwanmuang, Matthias Nießner et al.
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia et al.
ZeroStereo: Zero-shot Stereo Matching from Single Images
Xianqi Wang, Hao Yang, Gangwei Xu et al.
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces
Ziming Yu, Pan Zhou, Sike Wang et al.
ZFusion: Efficient Deep Compositional Zero-shot Learning for Blind Image Super-Resolution with Generative Diffusion Prior
Alireza Esmaeilzehi, Hossein Zaredar, Yapeng Tian et al.
ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim, Chanyong Shin, Joonhyun Jeong et al.
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
Yefei He, Feng Chen, Jing Liu et al.
ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models
Hyun Jun Yook, Ga San Jhun, Jae Hyun Cho et al.
2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision
Cheng-Kun Yang, Min-Hung Chen, Yung-Yu Chuang et al.
2D3D-MATR: 2D-3D Matching Transformer for Detection-Free Registration Between Images and Point Clouds
Minhao Li, Zheng Qin, Zhirui Gao et al.
360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking
Huajian Huang, Yinzhe Xu, Yingshu Chen et al.
3D-aware Blending with Generative NeRFs
Hyunsu Kim, Gayoung Lee, Yunjey Choi et al.
3D-Aware Generative Model for Improved Side-View Image Synthesis
Kyungmin Jo, Wonjoon Jin, Jaegul Choo et al.
3D-aware Image Generation using 2D Diffusion Models
Jianfeng Xiang, Jiaolong Yang, Binbin Huang et al.