Zhe Chen
61 papers · 2015–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Academic Marathon (10) π Conference Polyglot (12) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (12)
π§
Keyword Pioneer
π
Cross-Pollinator
(12)
π
Conference Polyglot
(12)
π€
Dynamic Duo
(18)
π₯
Mega-Team
(38)
π¬
Deep Specialist
(12)
π§¬
Topic Evolution
π
Trend Setter
ποΈ
Keyword Collector
(276)
β‘
Prolific Year
(11)
π₯
Unstoppable
(6)
π
Conference Pioneer
π
Century Club
(56)
Conferences
AAAI (18)
CVPR (10)
ICLR (6)
EMNLP (5)
NIPS (5)
ACL (4)
IJCAI (4)
ECCV (3)
ICCV (3)
COLING (1)
MICCAI (1)
NAACL (1)
Top co-authors
Keywords
large language model
(11)
vision-language model
(9)
semantic segmentation
(6)
multimodal learning
(6)
multimodal large language model
(4)
object detection
(4)
multi-modal learning
(3)
multi-agent path finding
(3)
visual question answering
(3)
self-supervised learning
(3)
path planning
(3)
contrastive learning
(3)
representation learning
(2)
transformer architecture
(2)
question answering
(2)
video understanding
(2)
zero-shot learning
(2)
motion planning
(2)
medical imaging
(2)
weakly supervised learning
(2)
Papers
Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills
AAAI 2026
Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents
AAAI 2026
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI
AAAI 2026
Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs
ACL 2026
MedSΒ³: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision
AAAI 2026
ReactGPT: Understanding of Chemical Reactions via In-Context Tuning
AAAI 2025
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations
IJCAI 2025
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
AAAI 2025
SLARD: A Chinese Superior Legal Article Retrieval Dataset
COLING 2025
LSDC: An Efficient and Effective Large-Scale Data Compression Method for Supervised Fine-tuning of Large Language Models
NAACL 2025
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
EMNLP 2025
Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
AAAI 2025
Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
AAAI 2025
Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding
AAAI 2025
Concurrent Planning and Execution in Lifelong Multi-Agent Path Finding with Delay Probabilities
AAAI 2025
SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians
ICCV 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models
EMNLP 2025
Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications
ACL 2025
EvolveBench: A Comprehensive Benchmark for Assessing Temporal Awareness in LLMs on Evolving Knowledge
ACL 2025
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation
EMNLP 2024
Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception
AAAI 2024
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
ICLR 2024
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
ICLR 2024
SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection
AAAI 2024
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI 2024
Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding
AAAI 2024
M3AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
ACL 2024
Polyp-Mamba: Polyp Segmentation with Visual Mamba
MICCAI 2024
Needle In A Multimodal Haystack
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
NIPS 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ECCV 2024
Mixed-domain Language Modeling for Processing Long Legal Documents
EMNLP 2023
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation
NIPS 2023
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose
CVPR 2023
Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation
CVPR 2023
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
Syllogistic Reasoning for Legal Judgment Analysis
EMNLP 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NIPS 2023
OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision
ICCV 2023
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023
Vision Transformer Adapter for Dense Predictions
ICLR 2023
Graph Propagation Transformer for Graph Representation Learning
IJCAI 2023
SASA: Semantics-Augmented Set Abstraction for Point-Based 3D Object Detection
AAAI 2022
Contrastive Boundary Learning for Point Cloud Segmentation
CVPR 2022
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization
AAAI 2022
MAPF-LNS2: Fast Repairing for Multi-Agent Path Finding via Large Neighborhood Search
AAAI 2022
Recurrent Glimpse-Based Decoder for Detection With Transformer
CVPR 2022
Anytime Multi-Agent Path Finding via Large Neighborhood Search
IJCAI 2021
Symmetry Breaking for k-Robust Multi-Agent Path Finding
AAAI 2021
Invertible Neural BRDF for Object Inverse Rendering
ECCV 2020
TextFuseNet: Scene Text Detection with Richer Fused Features
IJCAI 2020
Context Refinement for Object Detection
ECCV 2018
MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking
CVPR 2015