Zhe Chen

61 papers · 2015–2026 · 12 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (10) 🌍 Conference Polyglot (12) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (12) 🌍 Conference Polyglot (12) 🤝 Dynamic Duo (18) 👥 Mega-Team (38) 🔬 Deep Specialist (12) 🧬 Topic Evolution 📈 Trend Setter 🗃️ Keyword Collector (276) ⚡ Prolific Year (11) 🔥 Unstoppable (6) 🚀 Conference Pioneer 💎 Century Club (56)

Conferences

AAAI (18) CVPR (10) ICLR (6) EMNLP (5) NIPS (5) ACL (4) IJCAI (4) ECCV (3) ICCV (3) COLING (1) MICCAI (1) NAACL (1)

Top co-authors

Wenhai Wang (18) Jifeng Dai (15) Yu Qiao (13) Tong Lu (13) Xizhou Zhu (11) Lewei Lu (10) Yu Wang (8) Dacheng Tao (8) Yanfeng Wang (8) Weiyun Wang (7)

Keywords

large language model (11) vision-language model (9) semantic segmentation (6) multimodal learning (6) multimodal large language model (4) object detection (4) multi-modal learning (3) multi-agent path finding (3) visual question answering (3) self-supervised learning (3) path planning (3) contrastive learning (3) representation learning (2) transformer architecture (2) question answering (2) video understanding (2) zero-shot learning (2) motion planning (2) medical imaging (2) weakly supervised learning (2)

Papers

Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills AAAI 2026 Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents AAAI 2026 GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI AAAI 2026 Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs ACL 2026 MedS³: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision AAAI 2026 ReactGPT: Understanding of Chemical Reactions via In-Context Tuning AAAI 2025 Docopilot: Improving Multimodal Models for Document-Level Understanding CVPR 2025 PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models CVPR 2025 HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding CVPR 2025 RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations IJCAI 2025 Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures ICLR 2025 ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area AAAI 2025 SLARD: A Chinese Superior Legal Article Retrieval Dataset COLING 2025 LSDC: An Efficient and Effective Large-Scale Data Compression Method for Supervised Fine-tuning of Large Language Models NAACL 2025 DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction EMNLP 2025 Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis AAAI 2025 Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP AAAI 2025 Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding AAAI 2025 Concurrent Planning and Execution in Lifelong Multi-Agent Path Finding with Delay Probabilities AAAI 2025 SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians ICCV 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ICLR 2025 DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models EMNLP 2025 Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications ACL 2025 EvolveBench: A Comprehensive Benchmark for Assessing Temporal Awareness in LLMs on Evolving Knowledge ACL 2025 MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation EMNLP 2024 Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception AAAI 2024 Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments ICLR 2024 The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World ICLR 2024 GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation ICLR 2024 SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection AAAI 2024 AVSegFormer: Audio-Visual Segmentation with Transformer AAAI 2024 Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding AAAI 2024 M3AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset ACL 2024 Polyp-Mamba: Polyp Segmentation with Visual Mamba MICCAI 2024 Needle In A Multimodal Haystack NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks NIPS 2024 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks CVPR 2024 The All-Seeing Project V2: Towards General Relation Comprehension of the Open World ECCV 2024 Mixed-domain Language Modeling for Processing Long Legal Documents EMNLP 2023 All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation NIPS 2023 CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose CVPR 2023 Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation CVPR 2023 InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions CVPR 2023 Syllogistic Reasoning for Legal Judgment Analysis EMNLP 2023 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks NIPS 2023 OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision ICCV 2023 DDP: Diffusion Model for Dense Visual Prediction ICCV 2023 Vision Transformer Adapter for Dense Predictions ICLR 2023 Graph Propagation Transformer for Graph Representation Learning IJCAI 2023 SASA: Semantics-Augmented Set Abstraction for Point-Based 3D Object Detection AAAI 2022 Contrastive Boundary Learning for Point Cloud Segmentation CVPR 2022 Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization AAAI 2022 MAPF-LNS2: Fast Repairing for Multi-Agent Path Finding via Large Neighborhood Search AAAI 2022 Recurrent Glimpse-Based Decoder for Detection With Transformer CVPR 2022 Anytime Multi-Agent Path Finding via Large Neighborhood Search IJCAI 2021 Symmetry Breaking for k-Robust Multi-Agent Path Finding AAAI 2021 Invertible Neural BRDF for Object Inverse Rendering ECCV 2020 TextFuseNet: Scene Text Detection with Richer Fused Features IJCAI 2020 Context Refinement for Object Detection ECCV 2018 MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking CVPR 2015