Papers
8,506 papers found
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
Hang Du, Jiayang Zhang, Guoshun Nan et al.
From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning
Sen Wang, Shao Zeng, Tianjun Gu et al.
From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao, Lihe Ding, Rui Han et al.
From Gaze to Movement: Predicting Visual Attention for Autonomous Driving Human-Machine Interaction based on Programmatic Imitation Learning
Yexin Huang, Yongbin Lin, Lishengsa Yue et al.
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao, Bin Zhu, Jingjing Chen et al.
From Image to Video: An Empirical Study of Diffusion Representations
Pedro Vélez, Luisa F. Polanía, Yi Yang et al.
From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.
From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Anthony Bisulco, Rahul Ramesh, Randall Balestriero et al.
From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning
Yuhui Zeng, Haoxiang Wu, Wenjie Nie et al.
From One to More: Contextual Part Latents for 3D Generation
Shaocong Dong, Lihe Ding, Xiao Chen et al.
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva, Andrew Zisserman
From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition
Ling Lo, Kelvin C.K. Chan, Wen-Huang Cheng et al.
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo, Liangbing Zhao, Sayak Paul et al.
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu et al.
From Sharp to Blur: Unsupervised Domain Adaptation for 2D Human Pose Estimation Under Extreme Motion Blur Using Event Cameras
Youngho Kim, Hoonhee Cho, Kuk-Jin Yoon
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo, Fan Ma, Linchao Zhu et al.
FROSS: Faster-Than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images
Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira et al.
FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention
Xuan Ju, Weicai Ye, Quande Liu et al.
Function-centric Bayesian Network for Zero-Shot Object Goal Navigation
Sixian Zhang, Xinyao Yu, Xinhang Song et al.
Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation
Guopeng Li, Qiang Wang, Ke Yan et al.
Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues
Chen Chen, Kangcheng Bin, Ting Hu et al.
FusionPhys: A Flexible Framework for Fusing Complementary Sensing Modalities in Remote Physiological Measurement
Chenhang Ying, Huiyu Yang, Jieyi Ge et al.
Future-Aware Interaction Network For Motion Forecasting
Shijie Li, Chunyu Liu, Xun Xu et al.
FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling
Qiusheng Huang, Xiaohui Zhong, Xu Fan et al.
Fuzzy Contrastive Decoding to Alleviate Object Hallucination in Large Vision-Language Models
Jieun Kim, Jinmyeong Kim, Yoonji Kim et al.