Wenhai Wang

61 papers · 2018–2026 · 11 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)

🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (69) 🔬 Deep Specialist (16) 🧬 Topic Evolution 👥 Mega-Team (38) 👑 Triple Crown 🤝 Dynamic Duo (28) 🏆 Grand Slam 💎 Century Club (58) 🔥 Unstoppable (8) 📈 Trend Setter 🗃️ Keyword Collector (209) ⚡ Prolific Year (9)

Conferences

CVPR (13) NIPS (10) ECCV (9) ICCV (6) AAAI (5) ICLR (5) ACL (4) IJCAI (4) ICML (3) EMNLP (1) NAACL (1)

Top co-authors

Jifeng Dai (28) Yu Qiao (24) Tong Lu (20) Xizhou Zhu (19) Zhe Chen (18) Ping Luo (16) Lewei Lu (14) Enze Xie (14) Weiyun Wang (9) hongsheng Li (7)

Research topics

Optimization & Theory (1)

Keywords

semantic segmentation (10) vision-language model (8) object detection (7) convolutional neural network (5) multimodal large language model (5) multi-modal learning (4) multimodal learning (4) multi-task learning (3) foundation model (3) vision transformer (3) instance segmentation (3) image generation (3) visual question answering (3) large language model (3) diffusion model (2) zero-shot learning (2) image processing (2) image segmentation (2) deformable convolution (2) contrastive learning (2)

Papers

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models AAAI 2026 Selective Knowledge Distillation: Fusing LLM Semantic Strengths with DNN Efficiency for Binary Code Similarity Detection ACL 2026 LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment ACL 2026 PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models CVPR 2025 Docopilot: Improving Multimodal Models for Document-Level Understanding CVPR 2025 Diffuse&Refine: Intrinsic Knowledge Generation and Aggregation for Incremental Object Detection IJCAI 2025 MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost ICML 2025 CoMemo: LVLMs Need Image Context with Image Memory ICML 2025 UltraModel: A Modeling Paradigm for Industrial Objects IJCAI 2025 ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area AAAI 2025 Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting AAAI 2025 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference ACL 2025 Sticking to the Mean: Detecting Sticky Tokens in Text Embedding Models ACL 2025 Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures ICLR 2025 Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction ICCV 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ICLR 2025 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework ICCV 2025 HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding CVPR 2025 Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning NIPS 2024 Needle In A Multimodal Haystack NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks NIPS 2024 AVSegFormer: Audio-Visual Segmentation with Transformer AAAI 2024 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks CVPR 2024 Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications CVPR 2024 ControlLLM: Augment Language Models with Tools by Searching on Graphs ECCV 2024 The All-Seeing Project V2: Towards General Relation Comprehension of the Open World ECCV 2024 Distilling Knowledge from Large-Scale Image Models for Object Detection ECCV 2024 Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments ICLR 2024 The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World ICLR 2024 RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis ICML 2024 Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization NAACL 2024 EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought NIPS 2023 FB-BEV: BEV Representation from Forward-Backward View Transformations ICCV 2023 Vision Transformer Adapter for Dense Predictions ICLR 2023 InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions CVPR 2023 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks NIPS 2023 Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection NIPS 2023 CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code EMNLP 2023 Planning-Oriented Autonomous Driving CVPR 2023 Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks CVPR 2023 Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization AAAI 2022 VL-LTR: Learning Class-Wise Visual-Linguistic Representation for Long-Tailed Visual Recognition ECCV 2022 Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers CVPR 2022 BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers ECCV 2022 Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs NIPS 2022 SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers NIPS 2021 DetCo: Unsupervised Contrastive Learning for Object Detection ICCV 2021 Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions ICCV 2021 Segmenting Transparent Objects in the Wild with Transformer IJCAI 2021 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection CVPR 2021 Segmenting Transparent Objects in the Wild ECCV 2020 PolarMask: Single Shot Instance Segmentation With Polar Representation CVPR 2020 Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection NIPS 2020 Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation ECCV 2020 Scene Text Image Super-resolution in the wild ECCV 2020 AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting ECCV 2020 Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network ICCV 2019 Selective Kernel Networks CVPR 2019 Shape Robust Text Detection With Progressive Scale Expansion Network CVPR 2019 Mixed Link Networks IJCAI 2018