Papers
SINR: Sparsity Driven Compressed Implicit Neural Representations
Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe et al.
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
Yucheng Mao, Boyang Wang, Nilesh Kulkarni et al.
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models
Jie Ren, Kangrui Chen, Yingqian Cui et al.
SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons
Yuanyou Xu, Zongxin Yang, Yi Yang
SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs
Junsheng Wang, Nieqing Cao, Yan Ding et al.
SketchAgent: Language-Driven Sequential Sketch Generation
Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch
Aneeshan Sain, Subhajit Maity, Pinaki Nath Chowdhury et al.
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain et al.
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback
Mohd Hozaifa Khan, Ravi Kiran Sarvadevabhatla
SketchVideo: Sketch-based Video Generation and Editing
Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.
Sketchy Bounding-box Supervision for 3D Instance Segmentation
Qian Deng, Le Hui, Jin Xie et al.
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
Yinhuai Wang, Qihan Zhao, Runyi Yu et al.
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Shihan Wu, Ji Zhang, Pengpeng Zeng et al.
SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling
Qi Zhu, Jiangwei Lao, Deyi Ji et al.
SLADE: Shielding against Dual Exploits in Large Vision-Language Models
Md Zarif Hossain, Ahmed Imteaj
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
Ying Chen, Guoan Wang, Yuanfeng Ji et al.
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Shaoan Xie, Lingjing Lingjing, Yujia Zheng et al.
SmartEraser: Remove Anything from Images using Masked-Region Guidance
Longtao Jiang, Zhendong Wang, Jianmin Bao et al.
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity
Yijie Xu, Bolun Zheng, Wei Zhu et al.
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Jierun Chen, Dongting Hu, Xijie Huang et al.
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li et al.