Wenwei Zhang

44 papers · 2019–2025 · 11 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (11) 🗺️ Taxonomy Completionist (10) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6)

🗺️ Taxonomy Completionist (10) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (28) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (24) 🧬 Topic Evolution 💎 Century Club (44) ⚡ Prolific Year (7) ❓ The Questioner (5) 🔥 Unstoppable (7) 🗃️ Keyword Collector (178)

Conferences

CVPR (9) ACL (7) NIPS (7) ICCV (6) ECCV (4) ICLR (4) EMNLP (2) ICML (2) AAAI (1) NAACL (1) WACV (1)

Top co-authors

Kai Chen (28) Dahua Lin (17) Chen Change Loy (14) Jiangmiao Pang (12) Chengqi Lyu (8) Ziwei Liu (6) Size Wu (6) Songyang Zhang (6) Wentao Liu (5) Tai WANG (5)

Research topics

Mathematics (1)

Keywords

large language model (13) instance segmentation (5) vision-language model (5) benchmark evaluation (4) semantic segmentation (4) object detection (4) multimodal learning (4) instruction following (3) reinforcement learning (3) image segmentation (3) 3d scene understanding (3) evaluation benchmark (3) vision language model (3) reward model (3) point cloud (3) kernel learning (2) video segmentation (2) instruction tuning (2) multi-modal learning (2) visual grounding (2)

Papers

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model ACL 2025 Are Your LLMs Capable of Stable Reasoning? ACL 2025 Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs ICLR 2025 Harmonizing Visual Representations for Unified Multimodal Understanding and Generation ICCV 2025 Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives ICCV 2025 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities ICCV 2025 Training Language Models to Critique With Multi-agent Feedback EMNLP 2025 CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward EMNLP 2025 Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding WACV 2025 MindSearch: Mimicking Human Minds Elicits Deep AI Searcher ICLR 2025 F-LMM: Grounding Frozen Large Multimodal Models CVPR 2025 OMG-Seg: Is One Model Good Enough For All Segmentation? CVPR 2024 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI CVPR 2024 ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities ECCV 2024 AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models NIPS 2024 CriticEval: Evaluating Large-scale Language Model as Critic NIPS 2024 CLIM: Contrastive Language-Image Mosaic for Region Representation AAAI 2024 ANAH: Analytical Annotation of Hallucinations in Large Language Models ACL 2024 T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step ACL 2024 MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark ACL 2024 Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models ACL 2024 Code Needs Comments: Enhancing Code LLMs with Comment Augmentation ACL 2024 4D Contrastive Superflows are Dense 3D Representation Learners ECCV 2024 CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction ICLR 2024 Unified Human-Scene Interaction via Prompted Chain-of-Contacts ICLR 2024 Can AI Assistants Know What They Don’t Know? ICML 2024 Fake Alignment: Are LLMs Really Aligned Well? NAACL 2024 MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training CVPR 2023 Robo3D: Towards Robust and Reliable 3D Perception against Corruptions ICCV 2023 Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation ICCV 2023 Aligning Bag of Regions for Open-Vocabulary Object Detection CVPR 2023 Segment Any Point Cloud Sequences by Distilling Vision Foundation Models NIPS 2023 OV-PARTS: Towards Open-Vocabulary Part Segmentation NIPS 2023 Dense Distinct Query for End-to-End Object Detection CVPR 2023 Dense Siamese Network for Dense Unsupervised Learning ECCV 2022 Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation CVPR 2022 Seesaw Loss for Long-Tailed Instance Segmentation CVPR 2021 K-Net: Towards Unified Image Segmentation NIPS 2021 Side-Aware Boundary Localization for More Precise Object Detection ECCV 2020 EcoNAS: Finding Proxies for Economical Neural Architecture Search CVPR 2020 More Information Supervised Probabilistic Deep Face Embedding Learning ICML 2020 Robust Multi-Modality Multi-Object Tracking ICCV 2019