Papers
OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations
Peng-Hao Hsu, Ke Zhang, Fu-En Wang et al.
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang, Yongchao Feng, Ziqi Liu et al.
Open-set Cross Modal Generalization via Multimodal Unified Representation
Hai Huang, Yan Xia, Shulei Wang et al.
OpenSubstance: A High-quality Measured Dataset of Multi-View and -Lighting Images and Shapes
Fan Pei, Jinchen Bai, Xiang Feng et al.
Open-Unfairness Adversarial Mitigation for Generalized Deepfake Detection
Zhaoyang Li, Zhu Teng, Baopeng Zhang et al.
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li, Yanqing Liu, Haoqin Tu et al.
Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration
Ting Lei, Shaofeng Yin, Qingchao Chen et al.
Open-Vocabulary Octree-Graph for 3D Scene Understanding
Zhigang Wang, Yifei Su, Chenhui Li et al.
Open-World Skill Discovery from Unsegmented Demonstration Videos
Jingwen Deng, Zihao Wang, Shaofei Cai et al.
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu, Kun Yuan, Yaling Shen et al.
Optical Model-Driven Sharpness Mapping for Autofocus in Small Depth-of-Field and Severe Defocus Scenarios
Chen-Liang Fan, Mingpei Cao, Chih Chien Hung et al.
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao, Wang Lu, Jie Ji et al.
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Caoshuo Li, Zengmao Ding, Xiaobin Hu et al.
Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation
Akshay Krishnan, Xinchen Yan, Vincent Casser et al.
OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
Jinhong Wang, Shuo Tong, Jian Liu et al.
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu, Diankun Zhang, Zongchuang Zhao et al.
OURO: A Self-Bootstrapped Framework for Enhancing Multimodal Scene Understanding
Tianrun Xu, Guanyu Chen, Ye Li et al.
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Shanlin Sun, Yifan Wang, Hanwen Zhang et al.
OuroMamba: A Data-Free Quantization Framework for Vision Mamba
Akshat Ramachandran, Mingyu Lee, Huan Xu et al.
Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps
Chong Cheng, Sicheng Yu, Zijian Wang et al.
Outlier-Aware Post-Training Quantization for Image Super-Resolution
Hailing Wang, Jianglin Lu, Yitian Zhang et al.
OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance
Mingquan Zhou, Chen He, Ruiping Wang et al.
OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection
Heng Su, Mengying Xie, Nieqing Cao et al.
Overcoming Dual Drift for Continual Long-Tailed Visual Question Answering
Feifei Zhang, Zhihao Wang, Xi Zhang et al.
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng, Jiaqi Mao, Minghao Lai et al.