Lewei Lu
33 papers · 2020–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
π Academic Marathon (5) π Interdisciplinary Bridge π Conference Polyglot (7) π§ Keyword Pioneer π Cross-Pollinator (12)
π
Cross-Pollinator
(12)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(49)
π₯
Mega-Team
(38)
π
Grand Slam
π€
Dynamic Duo
(24)
β‘
Prolific Year
(15)
π
Century Club
(33)
ποΈ
Keyword Collector
(135)
π
Trend Setter
Conferences
CVPR (15)
ICLR (6)
NIPS (5)
ICCV (3)
ECCV (2)
AAAI (1)
ICML (1)
Top co-authors
Keywords
vision-language model
(5)
multimodal large language model
(5)
large language model
(3)
autonomous driving
(3)
object detection
(3)
image generation
(3)
semantic segmentation
(3)
multimodal document
(2)
contrastive learning
(2)
video processing
(2)
multi-task learning
(2)
visual representation
(2)
object localization
(2)
vision foundation model
(2)
convolutional neural network
(2)
knowledge distillation
(2)
multimodal learning
(2)
visual question answering
(2)
multi-modal learning
(2)
deformable convolution
(2)
Papers
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
ICML 2025
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
CVPR 2025
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
CVPR 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
Spatial Preference Rewarding for MLLMs Spatial Understanding
ICCV 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025
Weakly Supervised Monocular 3D Detection with a Single-View Image
CVPR 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
NIPS 2024
Learning 1D Causal Visual Representation with De-focus Attention Networks
NIPS 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
Masked AutoDecoder is Effective Multi-Task Vision Generalist
CVPR 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
CVPR 2024
Needle In A Multimodal Haystack
NIPS 2024
Modeling Continuous Motion for 3D Point Cloud Object Tracking
AAAI 2024
Parameter-Inverted Image Pyramid Networks
NIPS 2024
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
ICLR 2024
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
ICLR 2024
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ECCV 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ECCV 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
NIPS 2024
Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection
CVPR 2023
Planning-Oriented Autonomous Driving
CVPR 2023
Scene as Occupancy
ICCV 2023
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
CVPR 2023
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
CVPR 2023
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
ICCV 2021
Deformable DETR: Deformable Transformers for End-to-End Object Detection
ICLR 2021
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020