Wenqi Shao

55 papers · 2019–2026 · 9 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🗺️ Taxonomy Completionist (17) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (9)

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6) 🗺️ Taxonomy Completionist (17) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (41) 👥 Mega-Team (22) 🔬 Deep Specialist (13) 🧬 Topic Evolution ❓ The Questioner ⚡ Prolific Year (16) 🗃️ Keyword Collector (212) 💎 Century Club (53) 📈 Trend Setter 🔥 Unstoppable (7)

Conferences

ICLR (11) ICCV (10) CVPR (9) ICML (8) NIPS (6) ACL (5) AAAI (3) ECCV (2) IJCAI (1)

Top co-authors

Ping Luo (41) Kaipeng Zhang (25) Yu Qiao (21) peng gao (8) Xiaogang Wang (8) Zhaoyang Zhang (7) Yao Mu (7) Yue Yang (7) Fanqing Meng (6) Quanfeng Lu (6)

Keywords

large language model (8) convolutional neural network (5) benchmark evaluation (5) vision-language model (5) visual question answering (4) model compression (4) multimodal large language model (3) multimodal learning (3) agent system (2) diffusion model (2) image reconstruction (2) multi-modal learning (2) preference alignment (2) multimodal reasoning (2) domain adaptation (2) batch normalization (2) knowledge distillation (2) object detection (2) multitask learning (2) neural network (2)

Papers

D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies AAAI 2026 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models AAAI 2026 GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices ICCV 2025 LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation ICCV 2025 ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity ICCV 2025 Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation ICCV 2025 Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space ICCV 2025 Cross-Subject Mind Decoding from Inaccurate Representations ICCV 2025 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation ICML 2025 EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents ICLR 2025 Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation ICLR 2025 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation ACL 2025 Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping ICLR 2025 EfficientQAT: Efficient Quantization-Aware Training for Large Language Models ACL 2025 HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model ACL 2025 Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models CVPR 2025 OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation CVPR 2025 Distilling Monocular Foundation Model for Fine-grained Depth Completion CVPR 2025 DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation CVPR 2025 JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data CVPR 2025 MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification ACL 2025 TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts IJCAI 2025 SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement ICLR 2025 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models ICLR 2025 Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM ICCV 2025 Position: Towards Implicit Prompt For Text-To-Image Models ICML 2024 SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models ICML 2024 Needle In A Multimodal Haystack NIPS 2024 SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge NIPS 2024 Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality NIPS 2024 ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models NIPS 2024 Cached Transformers: Improving Transformers with Differentiable Memory Cachde AAAI 2024 ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning ACL 2024 DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model CVPR 2024 OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM CVPR 2024 "SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models" ECCV 2024 Tree-Planner: Efficient Close-loop Task Planning with Large Language Models ICLR 2024 BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation ICLR 2024 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models ICLR 2024 MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI ICML 2024 RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis ICML 2024 Beyond One-to-One: Rethinking the Referring Image Segmentation ICCV 2023 DiffRate : Differentiable Compression Rate for Efficient Vision Transformers ICCV 2023 Real-Time Controllable Denoising for Image and Video CVPR 2023 CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving ICLR 2023 Foundation Model is Efficient Multimodal Multitask Model Selector NIPS 2023 Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space ECCV 2022 Dynamic Token Normalization improves Vision Transformers ICLR 2022 What Makes for End-to-End Object Detection? ICML 2021 Rethinking the Pruning Criteria for Convolutional Neural Network NIPS 2021 Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution ICML 2021 Channel Equilibrium Networks for Learning Deep Representation ICML 2020 Towards Understanding Regularization in Batch Normalization ICLR 2019 Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks ICCV 2019 SSN: Learning Sparse Switchable Normalization via SparsestMax CVPR 2019