Papers
8,506 papers found
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li, Yanqing Liu, Haoqin Tu et al.
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei, Shaofeng Yin, Qingchao Chen et al.
Open-Vocabulary Octree-Graph for 3D Scene Understanding
Zhigang Wang, Yifei Su, Chenhui Li et al.
Open-World Skill Discovery from Unsegmented Demonstration Videos
Jingwen Deng, Zihao Wang, Shaofei Cai et al.
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu, Kun Yuan, Yaling Shen et al.
Optical Model-Driven Sharpness Mapping for Autofocus in Small Depth-of-Field and Severe Defocus Scenarios
Chen-Liang Fan, Mingpei Cao, Chih Chien Hung et al.
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao, Wang Lu, Jie Ji et al.
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Caoshuo Li, Zengmao Ding, Xiaobin Hu et al.
Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation
Akshay Krishnan, Xinchen Yan, Vincent Casser et al.
OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
Jinhong Wang, Shuo Tong, Jian Liu et al.
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu, Diankun Zhang, Zongchuang Zhao et al.
OURO: A Self-Bootstrapped Framework for Enhancing Multimodal Scene Understanding
Tianrun Xu, Guanyu Chen, Ye Li et al.
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Shanlin Sun, Yifan Wang, Hanwen Zhang et al.
OuroMamba: A Data-Free Quantization Framework for Vision Mamba
Akshat Ramachandran, Mingyu Lee, Huan Xu et al.
Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps
Chong Cheng, Sicheng Yu, Zijian Wang et al.
Outlier-Aware Post-Training Quantization for Image Super-Resolution
Hailing Wang, Jianglin Lu, Yitian Zhang et al.
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou, Chen He, Ruiping Wang et al.
OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection
Heng Su, Mengying Xie, Nieqing Cao et al.
Overcoming Dual Drift for Continual Long-Tailed Visual Question Answering
Feifei Zhang, Zhihao Wang, Xi Zhang et al.
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng, Jiaqi Mao, Minghao Lai et al.
OV-SCAN: Semantically Consistent Alignment for Novel Object Discovery in Open-Vocabulary 3D Object Detection
Adrian Chow, Evelien Riddell, Yimu Wang et al.
PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency
Haotian Wang, Aoran Xiao, Xiaoqin Zhang et al.
PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening
Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk et al.
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs
Teng Zhou, Xiaoyu Zhang, Yongchuan Tang
PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction
Jiahui Ren, Mochu Xiang, Jiajun Zhu et al.