Papers
18,421 papers found
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
Yiqi Zhu, Ziyue Wang, Can Zhang et al.
Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement
Xinjie Li, Ziyi Chen, Xinlu Yu et al.
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
Siyuan Cheng, Lingjuan Lyu, Zhenting Wang et al.
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Qingqing Zhao, Yao Lu, Moo Jin Kim et al.
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li, Xingxuan Zhang, Hao Zou et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du, Guangyao Li, Chang Zhou et al.
CraftsMan3D: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner
Weiyu Li, Jiarui Liu, Hongyu Yan et al.
Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting
Hanxi Liu, Yifang Men, Zhouhui Lian
CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
Jingnan Shi, Rajat Talak, Harry Zhang et al.
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
CroCoDL: Cross-device Collaborative Dataset for Localization
Hermann Blum, Alessandro Mercurio, Joshua O'Reilly et al.
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Seung Hyun Lee, Jijun Jiang, Yiran Xu et al.
Cross-Modal 3D Representation with Multi-View Images and Point Clouds
Ziyang Zhou, Pinghui Wang, Zi Liang et al.
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li, Cristiano Saltori, Fabio Poiesi et al.
Cross-modal Causal Relation Alignment for Video Question Grounding
Weixing Chen, Yang Liu, Binglin Chen et al.
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion
Saad Lahlali, Sandra Kara, Hejer Ammar et al.
Cross-modal Information Flow in Multimodal Large Language Models
Zhi Zhang, Srishti Yadav, Fengze Han et al.
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
Jie Mei, Chenyu Lin, Yu Qiu et al.
CrossOver: 3D Scene Cross-Modal Alignment
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.
Cross-Rejective Open-Set SAR Image Registration
Shasha Mao, Shiming Lu, Zhaolong Du et al.
CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections
Thomas Walker, Salvatore Esposito, Daniel Rebain et al.
Cross-View Completion Models are Zero-shot Correspondence Estimators
Honggyu An, Jin Hyeon Kim, Seonghoon Park et al.
CryptoFace: End-to-End Encrypted Face Recognition
Wei Ao, Vishnu Naresh Boddeti