Kaipeng Zhang

40 papers · 2017–2026 · 11 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (11) 🗺️ Taxonomy Completionist (12) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (8)

🐝 Cross-Pollinator (12) 🗺️ Taxonomy Completionist (12) 🧭 Keyword Pioneer 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (24) 👥 Mega-Team (22) 🔬 Deep Specialist (13) 📈 Trend Setter 🚀 Conference Pioneer ⚡ Prolific Year (17) 💎 Century Club (38) 🗃️ Keyword Collector (145)

Conferences

ICCV (7) NIPS (7) ICLR (6) ICML (5) ACL (4) AAAI (3) CVPR (3) IJCAI (2) ECCV (1) EMNLP (1) NAACL (1)

Top co-authors

Wenqi Shao (25) Yu Qiao (20) Ping Luo (19) peng gao (8) Yue Yang (8) Chuanhao Li (7) Yuqi Lin (6) Fanqing Meng (6) Mengzhao Chen (5) Shuo Liu (5)

Keywords

large language model (7) vision-language model (6) multimodal large language model (4) benchmark evaluation (3) multimodal learning (3) semantic segmentation (2) image classification (2) multimodal reasoning (2) image generation (2) instruction tuning (2) model compression (2) reasoning evaluation (2) text-to-image generation (2) agent system (2) multitask learning (2) evaluation benchmark (2) visual question answering (2) attention mechanism (2) cognitive modeling (1) transfer learning (1)

Papers

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models AAAI 2026 MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences ACL 2026 LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation ICCV 2025 EfficientQAT: Efficient Quantization-Aware Training for Large Language Models ACL 2025 MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification ACL 2025 OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation CVPR 2025 InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles EMNLP 2025 GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices ICCV 2025 ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity ICCV 2025 Neighboring Autoregressive Modeling for Efficient Visual Generation ICCV 2025 ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges ICCV 2025 SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement ICLR 2025 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models ICLR 2025 Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping ICLR 2025 ZipAR: Parallel Autoregressive Image Generation through Spatial Locality ICML 2025 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation ICML 2025 TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts IJCAI 2025 Needle In A Multimodal Haystack NIPS 2024 DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model CVPR 2024 SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models ICML 2024 OneLLM: One Framework to Align All Modalities with Language CVPR 2024 Position: Towards Implicit Prompt For Text-To-Image Models ICML 2024 MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI ICML 2024 T3M: Text Guided 3D Human Motion Synthesis from Speech NAACL 2024 BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation ICLR 2024 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models ICLR 2024 SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge NIPS 2024 Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality NIPS 2024 ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models NIPS 2024 Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT NIPS 2024 TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP without Training AAAI 2024 Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification AAAI 2024 ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning ACL 2024 Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching ICLR 2024 Foundation Model is Efficient Multimodal Multitask Model Selector NIPS 2023 DiffRate : Differentiable Compression Rate for Efficient Vision Transformers ICCV 2023 RaMLP: Vision MLP via Region-aware Mixing IJCAI 2023 Neural Routing by Memory NIPS 2021 Super-Identity Convolutional Neural Network for Face Hallucination ECCV 2018 Detecting Faces Using Inside Cascaded Contextual CNN ICCV 2017