Papers
4,428 papers found
Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling
Alexander Prutsch, David Schinagl, Horst Possegger
StreetView-Waste: A Multi-Task Dataset for Urban Waste Management
Diogo J. Paulo, João Martins, Hugo Proença et al.
STRinGS: Selective Text Refinement in Gaussian Splatting
Abhinav Raundhal, Gaurav Behera, P. J. Narayanan et al.
Stroke Modeling Enables Vectorized Character Generation with Large Vectorized Glyph Model
Xinyue Zhang, Haolong Li, Jiawei Ma et al.
Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-free Open-Vocabulary Semantic Segmentation
Qiming Huang, Hao Ai, Jianbo Jiao
Structured Context Learning for Generic Event Boundary Detection
Xin Gu, Congcong Li, Xinyao Wang et al.
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu, Yunze Liu, Miao Liu et al.
Style-Friendly SNR Sampler for Style-Driven Generation
Jooyoung Choi, Chaehun Shin, Yeongtak Oh et al.
Subspace-Guided Knowledge Distillation for Efficient Model Transfer
Zeeshan Hayder, Ali Cheraghian, Lars Petersson et al.
SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities
Dung Thuy Nguyen, Quang Nguyen, Preston K. Robinette et al.
Sun-E: Dataset and Benchmark for Event-Based Sun Sensing
Sydney Dolan, Alessandro Golkar
SuperRivolution: Fine-Scale Rivers from Coarse Temporal Satellite Imagery
Rangel Daroya, Subhransu Maji
SurfDist: Interpretable Three-Dimensional Instance Segmentation Using Curved Surface Patches
Jackson Borchardt, Saul Kato
Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering using Gaussian Surfels
Idris O. Sunmola, Zhenjun Zhao, Samuel Schmidgall et al.
SurgXBench: Explainable Vision-Language Model Benchmark for Surgery
Jiajun Cheng, Xianwu Zhao, Sainan Liu et al.
SVD-Det: A Lightweight Framework for Video Forgery Detection Using Semantic and Visual Defect Cues
Tsung-Shan Yang, Tianyu Zhang, Feng Qian et al.
SVS-GAN for Semantic Synthesis of Traffic Videos for Autonomous Driving
Khaled M. Seyam, Julian Wiederer, Markus Braun et al.
SymNet: A Multi-Task Network for Joint Radio Map Reconstruction and Transmitter Localization
Lyuzhou Ye, Thanh Dat Le, Yan Huang
SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding
Phyo Thet Yee, Dimitrios Kollias, Sudeepta Mishra et al.
SynPlay: Large-Scale Synthetic Human Data with Real-World Diversity for Aerial-View Perception
Jinsub Yim, Hyungtae Lee, Sungmin Eum et al.
Synthesizing Compositional Videos from Text Description
Prajwal Singh, Kuldeep Kulkarni, Shanmuganathan Raman et al.
Systematic Analysis of the Unintentional CSAM-Generation-Potential of Text-to-Image Models
Nicolas Göller, Martin Steinebach
T2LF: LLM-Guided Multimodal Diffusion for Text-to-Light Field Synthesis
Soyoung Yoon, Namhyuk Ahn, In Kyu Park
T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation
Yubin Chen, Xuyang Guo, Zhenmei Shi et al.
Tables Decoded: DELTA for Structure, TARQA for Understanding
Jahanvi Rajput, Dhruv Kudale, Saikiran Kasturi et al.