Papers
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation
Yuheng Feng, Changsong Wen, Zelin Peng et al.
Rethinking Correspondence-based Category-Level Object Pose Estimation
Huan Ren, Wenfei Yang, Shifeng Zhang et al.
Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression
Zichong Meng, Yiming Xie, Xiaogang Peng et al.
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
Runsong Zhu, Shi Qiu, Zhengzhe Liu et al.
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
Chen-Chen Zong, Sheng-Jun Huang
Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages
Matteo Farina, Massimiliano Mancini, Giovanni Iacca et al.
Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection
Yifan Chang, Junjie Huang, Xiaofeng Wang et al.
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment
Huakai Lai, Guoxin Xiong, Huayu Mai et al.
Rethinking Personalized Aesthetics Assessment: Employing Physique Aesthetics Assessment as An Exemplification
Haobin Zhong, Shuai He, Anlong Ming et al.
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang et al.
Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond
Tengyu Ma, Long Ma, Ziye Li et al.
Rethinking Spiking Self-Attention Mechanism: Implementing a-XNOR Similarity Calculation in Spiking Transformers
Yichen Xiao, Shuai Wang, Dehao Zhang et al.
Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction
Dubing Chen, Huan Zheng, Jin Fang et al.
Re-thinking Temporal Search for Long-Form Video Understanding
Jinhui Ye, Zihan Wang, Haosen Sun et al.
Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game
Keyizhi Xu, Chi Zhang, Zhan Chen et al.
Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks
Cheng Lei, Ao Li, Hu Yao et al.
Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Eunji Kim, Siwon Kim, Minjun Park et al.
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo, Xiufeng Song, Yue Zhang et al.
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
M. Hamza Mughal, Rishabh Dabral, Merel C.J. Scholman et al.
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
Hongda Liu, Yunfan Liu, Min Ren et al.
Reversible Decoupling Network for Single Image Reflection Removal
Hao Zhao, Mingjia Li, Qiming Hu et al.
Reversing Flow for Image Restoration
Haina Qin, Wenyang Luo, Libin Wang et al.
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
Shaofei Huang, Rui Ling, Tianrui Hui et al.