Wenhai Wang
61 papers · 2018–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Conference Polyglot (11) π Academic Marathon (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (12)
π
Cross-Pollinator
(12)
π
Renaissance Researcher
(7)
πΊοΈ
Taxonomy Completionist
(69)
π¬
Deep Specialist
(16)
π§¬
Topic Evolution
π₯
Mega-Team
(38)
π
Triple Crown
π€
Dynamic Duo
(28)
π
Grand Slam
π
Century Club
(58)
π₯
Unstoppable
(8)
π
Trend Setter
ποΈ
Keyword Collector
(209)
β‘
Prolific Year
(9)
Conferences
CVPR (13)
NIPS (10)
ECCV (9)
ICCV (6)
AAAI (5)
ICLR (5)
ACL (4)
IJCAI (4)
ICML (3)
EMNLP (1)
NAACL (1)
Top co-authors
Research topics
Keywords
semantic segmentation
(10)
vision-language model
(8)
object detection
(7)
convolutional neural network
(5)
multimodal large language model
(5)
multi-modal learning
(4)
multimodal learning
(4)
multi-task learning
(3)
foundation model
(3)
vision transformer
(3)
instance segmentation
(3)
image generation
(3)
visual question answering
(3)
large language model
(3)
diffusion model
(2)
zero-shot learning
(2)
image processing
(2)
image segmentation
(2)
deformable convolution
(2)
contrastive learning
(2)
Papers
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models
AAAI 2026
Selective Knowledge Distillation: Fusing LLM Semantic Strengths with DNN Efficiency for Binary Code Similarity Detection
ACL 2026
LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment
ACL 2026
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
Diffuse&Refine: Intrinsic Knowledge Generation and Aggregation for Incremental Object Detection
IJCAI 2025
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
ICML 2025
CoMemo: LVLMs Need Image Context with Image Memory
ICML 2025
UltraModel: A Modeling Paradigm for Industrial Objects
IJCAI 2025
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
AAAI 2025
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting
AAAI 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
Sticking to the Mean: Detecting Sticky Tokens in Text Embedding Models
ACL 2025
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
ICCV 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
NIPS 2024
Needle In A Multimodal Haystack
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
NIPS 2024
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ECCV 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ECCV 2024
Distilling Knowledge from Large-Scale Image Models for Object Detection
ECCV 2024
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
ICLR 2024
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization
NAACL 2024
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
NIPS 2023
FB-BEV: BEV Representation from Forward-Backward View Transformations
ICCV 2023
Vision Transformer Adapter for Dense Predictions
ICLR 2023
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NIPS 2023
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
NIPS 2023
CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code
EMNLP 2023
Planning-Oriented Autonomous Driving
CVPR 2023
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
CVPR 2023
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization
AAAI 2022
VL-LTR: Learning Class-Wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
ECCV 2022
Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
CVPR 2022
BEVFormer: Learning Birdβs-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
ECCV 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
NIPS 2022
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
NIPS 2021
DetCo: Unsupervised Contrastive Learning for Object Detection
ICCV 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions
ICCV 2021
Segmenting Transparent Objects in the Wild with Transformer
IJCAI 2021
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
CVPR 2021
Segmenting Transparent Objects in the Wild
ECCV 2020
PolarMask: Single Shot Instance Segmentation With Polar Representation
CVPR 2020
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
NIPS 2020
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
ECCV 2020
Scene Text Image Super-resolution in the wild
ECCV 2020
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
ECCV 2020
Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network
ICCV 2019
Selective Kernel Networks
CVPR 2019
Shape Robust Text Detection With Progressive Scale Expansion Network
CVPR 2019
Mixed Link Networks
IJCAI 2018