Jifeng Dai
89 papers · 2013–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (11) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π Conference Polyglot (6)
π
Interdisciplinary Bridge
π
Conference Polyglot
(6)
πΊοΈ
Taxonomy Completionist
(11)
π
Conference Loyalist
(34)
π€
Dynamic Duo
(46)
π
Grand Slam
π¬
Deep Specialist
(16)
π§¬
Topic Evolution
π₯
Mega-Team
(38)
π
Triple Crown
β‘
Prolific Year
(14)
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(13)
π
Century Club
(88)
ποΈ
Keyword Collector
(299)
Conferences
CVPR (34)
NIPS (15)
ICCV (14)
ICLR (12)
ECCV (10)
ICML (3)
AAAI (1)
Top co-authors
Keywords
object detection
(14)
semantic segmentation
(10)
vision-language model
(9)
convolutional neural network
(9)
multimodal large language model
(6)
multimodal learning
(5)
foundation model
(5)
multi-task learning
(5)
multi-modal learning
(5)
large language model
(5)
self-supervised learning
(4)
visual representation
(4)
instance segmentation
(4)
deformable convolution
(4)
representation learning
(3)
optical flow
(3)
weakly supervised learning
(3)
transfer learning
(3)
image generation
(3)
visual question answering
(3)
Papers
Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy
AAAI 2026
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
CVPR 2025
MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
CVPR 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
ICCV 2025
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
LangBridge: Interpreting Image as a Combination of Language Embeddings
ICCV 2025
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
ICCV 2025
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
CVPR 2025
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
ICML 2025
CoMemo: LVLMs Need Image Context with Image Memory
ICML 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
ICLR 2025
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
Learning 1D Causal Visual Representation with De-focus Attention Networks
NIPS 2024
DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model
NIPS 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
NIPS 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
CVPR 2024
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
CVPR 2024
CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics
NIPS 2024
Parameter-Inverted Image Pyramid Networks
NIPS 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
ICLR 2024
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
ICLR 2024
Distilling Knowledge from Large-Scale Image Models for Object Detection
ECCV 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ECCV 2024
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ECCV 2024
Needle In A Multimodal Haystack
NIPS 2024
Vision Transformer Adapter for Dense Predictions
ICLR 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
NIPS 2023
JourneyDB: A Benchmark for Generative Image Understanding
NIPS 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NIPS 2023
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
CVPR 2023
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
CVPR 2023
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
CVPR 2023
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
CVPR 2023
Siamese Image Modeling for Self-Supervised Vision Representation Learning
CVPR 2023
Video Dehazing via a Multi-Range Temporal Alignment Network With Physical Prior
CVPR 2023
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
Planning-Oriented Autonomous Driving
CVPR 2023
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions
CVPR 2023
VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
ICCV 2023
MCMAE: Masked Convolution Meets Masked Autoencoders
NIPS 2022
BEVFormer: Learning Birdβs-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
ECCV 2022
FlowFormer: A Transformer Architecture for Optical Flow
ECCV 2022
VL-LTR: Learning Class-Wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
ECCV 2022
Frozen CLIP Models Are Efficient Video Learners
ECCV 2022
Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification
ECCV 2022
Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework
CVPR 2022
AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks
CVPR 2022
Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks
CVPR 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
NIPS 2022
Fast Convergence of DETR With Spatially Modulated Co-Attention
ICCV 2021
Deformable DETR: Deformable Transformers for End-to-End Object Detection
ICLR 2021
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation
ICLR 2021
Unsupervised Object Detection With LIDAR Clues
CVPR 2021
Influence Selection for Active Learning
ICCV 2021
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
ICCV 2021
Exploring Cross-Image Pixel Contrast for Semantic Segmentation
ICCV 2021
Searching Parameterized AP Loss for Object Detection
NIPS 2021
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation
ECCV 2020
Hierarchical Human Parsing With Typed Part-Relation Reasoning
CVPR 2020
Resolution Adaptive Networks for Efficient Inference
CVPR 2020
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation
ICLR 2020
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020
An Empirical Study of Spatial Attention Mechanisms in Deep Networks
ICCV 2019
Deformable ConvNets V2: More Deformable, Better Results
CVPR 2019
Towards High Performance Video Object Detection
CVPR 2018
Relation Networks for Object Detection
CVPR 2018
Learning Region Features for Object Detection
ECCV 2018
Deep Feature Flow for Video Recognition
CVPR 2017
Deformable Convolutional Networks
ICCV 2017
Flow-Guided Feature Aggregation for Video Object Detection
ICCV 2017
Fully Convolutional Instance-Aware Semantic Segmentation
CVPR 2017
R-FCN: Object Detection via Region-based Fully Convolutional Networks
NIPS 2016
Instance-Aware Semantic Segmentation via Multi-Task Network Cascades
CVPR 2016
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
CVPR 2016
Convolutional Feature Masking for Joint Object and Stuff Segmentation
CVPR 2015
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
ICCV 2015
Unsupervised Learning of Dictionaries of Hierarchical Compositional Models
CVPR 2014
Cosegmentation and Cosketch by Unsupervised Learning
ICCV 2013