Kaipeng Zhang
40 papers · 2017–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (11) πΊοΈ Taxonomy Completionist (12) π Interdisciplinary Bridge π Academic Marathon (8)
π
Cross-Pollinator
(12)
πΊοΈ
Taxonomy Completionist
(12)
π§
Keyword Pioneer
π
Grand Slam
π
Triple Crown
π€
Dynamic Duo
(24)
π₯
Mega-Team
(22)
π¬
Deep Specialist
(13)
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(17)
π
Century Club
(38)
ποΈ
Keyword Collector
(145)
Conferences
ICCV (7)
NIPS (7)
ICLR (6)
ICML (5)
ACL (4)
AAAI (3)
CVPR (3)
IJCAI (2)
ECCV (1)
EMNLP (1)
NAACL (1)
Top co-authors
Keywords
large language model
(7)
vision-language model
(6)
multimodal large language model
(4)
benchmark evaluation
(3)
multimodal learning
(3)
semantic segmentation
(2)
image classification
(2)
multimodal reasoning
(2)
image generation
(2)
instruction tuning
(2)
model compression
(2)
reasoning evaluation
(2)
text-to-image generation
(2)
agent system
(2)
multitask learning
(2)
evaluation benchmark
(2)
visual question answering
(2)
attention mechanism
(2)
cognitive modeling
(1)
transfer learning
(1)
Papers
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
AAAI 2026
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
ACL 2026
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
ACL 2025
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
ACL 2025
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
CVPR 2025
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
EMNLP 2025
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
ICCV 2025
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
ICCV 2025
Neighboring Autoregressive Modeling for Efficient Visual Generation
ICCV 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
ICCV 2025
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
ICLR 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
ICLR 2025
ZipAR: Parallel Autoregressive Image Generation through Spatial Locality
ICML 2025
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICML 2025
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts
IJCAI 2025
Needle In A Multimodal Haystack
NIPS 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
Position: Towards Implicit Prompt For Text-To-Image Models
ICML 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
T3M: Text Guided 3D Human Motion Synthesis from Speech
NAACL 2024
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
NIPS 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality
NIPS 2024
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
NIPS 2024
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NIPS 2024
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP without Training
AAAI 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
AAAI 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
ACL 2024
Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
ICLR 2024
Foundation Model is Efficient Multimodal Multitask Model Selector
NIPS 2023
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
ICCV 2023
RaMLP: Vision MLP via Region-aware Mixing
IJCAI 2023
Neural Routing by Memory
NIPS 2021
Super-Identity Convolutional Neural Network for Face Hallucination
ECCV 2018
Detecting Faces Using Inside Cascaded Contextual CNN
ICCV 2017