Papers
18,421 papers found
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang, Jiluan Fan, Anli Yan et al.
Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection
Fuyun Wang, Tong Zhang, Yuanzhi Wang et al.
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows
Mashrur M. Morshed, Vishnu Boddeti
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos
Lorenzo Mur-Labadia, Josechu Guerrero, Ruben Martinez-Cantin
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification
Gaozheng Pei, Shaojie Lyu, Gong Chen et al.
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Yuying Ge, Yizhuo Li, Yixiao Ge et al.
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-identification
Zhenyu Cui, Jiahuan Zhou, Yuxin Peng
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
Qianlong Xiang, Miao Zhang, Yuzhang Shang et al.
DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal
Zhilv Yi, Xiao Lu, Hong Ding et al.
DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
Xinyi Zhang, Naiqi Li, Angela Dai
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables
Sidi Yang, Binxiao Huang, Yulun Zhang et al.
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao, Jiapeng Wang, Hongliang Li et al.
Do Computer Vision Foundation Models Learn the Low-level Characteristics of the Human Visual System?
Yancheng Cai, Fei Yin, Dounia Hammou et al.
Docopilot: Improving Multimodal Models for Document-Level Understanding
Yuchen Duan, Zhe Chen, Yusong Hu et al.
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
Xiao-Hui Li, Fei Yin, Cheng-Lin Liu
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Jun Chen, Dannong Xu, Junjie Fei et al.
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal
Yujie Wang, Praneeth Chakravarthula, Baoquan Chen
Do ImageNet-trained Models Learn Shortcuts? The Impact of Frequency Shortcuts on Generalization
Shunxin Wang, Raymond Veldhuis, Nicola Strisciuglio
Domain Adaptive Diabetic Retinopathy Grading with Model Absence and Flowing Data
Wenxin Su, Song Tang, Xiaofeng Liu et al.
Domain Generalization in CLIP via Learning with Diverse Text Prompts
Changsong Wen, Zelin Peng, Yu Huang et al.