Papers
SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning
Zhi Chen, Zecheng Zhao, Jingcai Guo et al.
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du, Zhineng Chen, Hongtao Xie et al.
SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization
Zhentao Tan, Ben Xue, Jian Jia et al.
Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration
Jongsuk Kim, Jaeyoung Lee, Gyojin Han et al.
SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis
Wenkun He, Yun Liu, Ruitao Liu et al.
Synchronization of Multiple Videos
Avihai Naaman, Ron Shapira Weber, Oren Freifeld
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong, Jegyeong Cho, Youngho Yoon et al.
SynCity: Training-Free Generation of 3D Worlds
Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.
Synergistic Prompting for Robust Visual Recognition with Missing Modalities
Zhihui Zhang, Luanyuan Dai, Qika Lin et al.
SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data
Xilin He, Cheng Luo, Xiaole Xian et al.
SynTag: Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking
Han Fang, Kejiang Chen, Zehua Ma et al.
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao, Xingyu Ni, Ziyu Wang et al.
T2Bs: Text-to-Character Blendshapes via Video Generation
Jiahao Luo, Chaoyang Wang, Michael Vasilkovsky et al.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen, Min Shi, Gong Zhang et al.
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
Pooyan Rahmanzadehgervi, Hung Huy Nguyen, Rosanne Liu et al.
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu, Yixin Chen, Yu Liu et al.
TAD-E2E: A Large-scale End-to-end Autonomous Driving Dataset
Chang Liu, Mingxu Zhu, Zheyuan Zhang et al.
TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity
Yuzhuo Chen, Zehua Ma, Han Fang et al.
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti, Lorenzo Bianchi, Nicola Messina et al.
Taming Flow Matching with Unbalanced Optimal Transport into Fast Pansharpening
Zihan Cao, Yu Zhong, Liang-Jian Deng
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Bowen Wang, Zhouqiang Jiang, Yasuaki Susumu et al.
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus, Carl Doersch, Yi Yang et al.
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang, Yutong Liu, Yangguang Li et al.