Wenqi Shao
55 papers · 2019–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (17) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π Conference Polyglot (9)
π
Interdisciplinary Bridge
π
Academic Marathon
(6)
πΊοΈ
Taxonomy Completionist
(17)
π
Grand Slam
π
Triple Crown
π€
Dynamic Duo
(41)
π₯
Mega-Team
(22)
π¬
Deep Specialist
(13)
π§¬
Topic Evolution
β
The Questioner
β‘
Prolific Year
(16)
ποΈ
Keyword Collector
(212)
π
Century Club
(53)
π
Trend Setter
π₯
Unstoppable
(7)
Conferences
ICLR (11)
ICCV (10)
CVPR (9)
ICML (8)
NIPS (6)
ACL (5)
AAAI (3)
ECCV (2)
IJCAI (1)
Top co-authors
Keywords
large language model
(8)
convolutional neural network
(5)
benchmark evaluation
(5)
vision-language model
(5)
visual question answering
(4)
model compression
(4)
multimodal large language model
(3)
multimodal learning
(3)
agent system
(2)
diffusion model
(2)
image reconstruction
(2)
multi-modal learning
(2)
preference alignment
(2)
multimodal reasoning
(2)
domain adaptation
(2)
batch normalization
(2)
knowledge distillation
(2)
object detection
(2)
multitask learning
(2)
neural network
(2)
Papers
D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies
AAAI 2026
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
AAAI 2026
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
ICCV 2025
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
ICCV 2025
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation
ICCV 2025
Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space
ICCV 2025
Cross-Subject Mind Decoding from Inaccurate Representations
ICCV 2025
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICML 2025
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
ICLR 2025
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
ACL 2025
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
ICLR 2025
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
ACL 2025
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
ACL 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
CVPR 2025
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
CVPR 2025
Distilling Monocular Foundation Model for Fine-grained Depth Completion
CVPR 2025
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
CVPR 2025
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data
CVPR 2025
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
ACL 2025
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts
IJCAI 2025
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
ICLR 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
ICCV 2025
Position: Towards Implicit Prompt For Text-To-Image Models
ICML 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
Needle In A Multimodal Haystack
NIPS 2024
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
NIPS 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality
NIPS 2024
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
NIPS 2024
Cached Transformers: Improving Transformers with Differentiable Memory Cachde
AAAI 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
ACL 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
CVPR 2024
"SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models"
ECCV 2024
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
ICLR 2024
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
Beyond One-to-One: Rethinking the Referring Image Segmentation
ICCV 2023
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
ICCV 2023
Real-Time Controllable Denoising for Image and Video
CVPR 2023
CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving
ICLR 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
NIPS 2023
Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space
ECCV 2022
Dynamic Token Normalization improves Vision Transformers
ICLR 2022
What Makes for End-to-End Object Detection?
ICML 2021
Rethinking the Pruning Criteria for Convolutional Neural Network
NIPS 2021
Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution
ICML 2021
Channel Equilibrium Networks for Learning Deep Representation
ICML 2020
Towards Understanding Regularization in Batch Normalization
ICLR 2019
Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks
ICCV 2019
SSN: Learning Sparse Switchable Normalization via SparsestMax
CVPR 2019