Zhen Li

108 papers · 2011–2026 · 19 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🗺️ Taxonomy Completionist (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (19)

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (15) 🗺️ Taxonomy Completionist (11) 🏠 Conference Loyalist (26) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (31) 👥 Mega-Team (20) 🔬 Deep Specialist (23) 🧬 Topic Evolution 🏆 Keyword Champion (2) 🚀 Conference Pioneer ⚡ Prolific Year (17) 💎 Century Club (104) 📈 Trend Setter 🔥 Unstoppable (12) 🗃️ Keyword Collector (456)

Conferences

CVPR (26) AAAI (18) ICCV (13) NIPS (10) ACL (6) IJCAI (6) ECCV (5) EMNLP (5) MICCAI (4) COLING (3) NAACL (2) ICML (2) ICLR (2) JMLR (1) MIDL (1) ACML (1) NSDI (1) SEMEVAL (1) WACV (1)

Top co-authors

Shuguang Cui (32) Xu Yan (16) Ruimao Zhang (13) Chaoda Zheng (10) Chuanhao Li (8) Guanbin Li (7) Ming-Ming Cheng (7) Yuwei Wu (7) Sheng wang (7) Jiantao Gao (7)

Research topics

Education (1)

Keywords

point cloud (12) multimodal learning (9) semantic segmentation (9) contrastive learning (8) knowledge distillation (7) diffusion model (6) self-supervised learning (6) 3d object detection (6) neural network (5) 3d vision (5) autonomous driving (5) compositional generalization (5) attention mechanism (5) large language model (5) multi-modal learning (5) visual question answering (5) vision-language model (4) text generation (4) scene understanding (4) metric learning (3)

Papers

Composition-Incremental Learning for Compositional Generalization AAAI 2026 PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models WACV 2026 Cancer Survival Prediction by Cyclic Generation and Multi-grained Alignment AAAI 2026 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models AAAI 2026 DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving AAAI 2026 A General Framework to Enhance Fine-tuning-based LLM Unlearning ACL 2025 Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models ACL 2025 Multi-Sourced Compositional Generalization in Visual Question Answering IJCAI 2025 GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding ICLR 2025 AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction ICCV 2025 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework ICCV 2025 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning ICCV 2025 DCP: Dual-Cue Pruning for Efficient Large Vision-Language Models EMNLP 2025 VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving CVPR 2025 DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering CVPR 2025 Cervical-RG: Automated Cervical Cancer Report Generation from 3D Multi-sequence MRI via CoT-guided Hierarchical Experts MICCAI 2025 VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering AAAI 2025 Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning AAAI 2025 Consistency of Compositional Generalization Across Multiple Levels AAAI 2025 DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation CVPR 2025 Empowering Large Language Models with 3D Situation Awareness CVPR 2025 K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs CVPR 2025 Sign2Vis: Automated Data Visualization from Sign Language ACL 2025 Leveraging Large Language Models for NLG Evaluation: Advances and Challenges EMNLP 2024 SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge NIPS 2024 Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection NIPS 2024 CrossBind: Collaborative Cross-Modal Identification of Protein Nucleic-Acid-Binding Residues AAAI 2024 X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer AAAI 2024 WeakPCSOD: Overcoming the Bias of Box Annotations for Weakly Supervised Point Cloud Salient Object Detection AAAI 2024 RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation AAAI 2024 PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding CVPR 2024 Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding CVPR 2024 MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection ECCV 2024 Compositional Substitutivity of Visual Reasoning for Visual Question Answering ECCV 2024 In-Context Compositional Generalization for Large Vision-Language Models EMNLP 2024 DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation ICLR 2024 Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding ICML 2024 EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy MICCAI 2024 Multilevel Causality Learning for Multi-label Gastric Atrophy Diagnosis MICCAI 2024 Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development MICCAI 2024 RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels ICCV 2023 MMTN: Multi-Modal Memory Transformer Network for Image-Report Consistent Medical Report Generation AAAI 2023 AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation CVPR 2023 Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes CVPR 2023 DNF: Decouple and Feedback Network for Seeing in the Dark CVPR 2023 Learning Transformation-Predictive Representations for Detection and Description of Local Features CVPR 2023 Toward Unpaired Multi-modal Medical Image Segmentation via Learning Structured Semantic Consistency MIDL 2023 PersLEARN: Research Training through the Lens of Perspective Cultivation ACL 2023 FAA: Fine-grained Attention Alignment for Cascade Document Ranking ACL 2023 CowClip: Reducing CTR Prediction Model Training Time from 12 Hours to 10 Minutes on 1 GPU AAAI 2023 Fair-CDA: Continuous and Directional Augmentation for Group Fairness AAAI 2023 Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation NIPS 2023 Composable Text Controls in Latent Space with ODEs EMNLP 2023 Geometry-Aware Network for Domain Adaptive Semantic Segmentation AAAI 2023 Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness NIPS 2023 SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training ICCV 2023 SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection ICCV 2023 LATR: 3D Lane Detection from Monocular Images with Transformer ICCV 2023 SRFormer: Permuted Self-Attention for Single Image Super-Resolution ICCV 2023 Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label Domains CVPR 2023 Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language CVPR 2023 BEV@DC: Bird's-Eye View Assisted Training for Depth Completion CVPR 2023 Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation AAAI 2022 Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds CVPR 2022 AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation NIPS 2022 An Error Analysis of Generative Adversarial Networks for Learning Distributions JMLR 2022 Reciprocal Learning of Knowledge Retriever and Response Ranker for Knowledge-Grounded Conversations COLING 2022 Contextual Bandits with Knapsacks for a Conversion Model NIPS 2022 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds ECCV 2022 Weakly Supervised Object Localization through Inter-class Feature Similarity and Intra-Class Appearance Consistency ECCV 2022 CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection NAACL 2022 Towards an End-to-End Framework for Flow-Guided Video Inpainting CVPR 2022 FCGCL: Fine- and Coarse-Granularity Contrastive Learning for Speech Translation EMNLP 2022 X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning CVPR 2022 Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation NAACL 2022 Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning NIPS 2022 Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis NIPS 2022 Graph Enhanced Contrastive Learning for Radiology Findings Summarization ACL 2022 PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images CVPR 2022 Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion AAAI 2021 Shallow Feature Matters for Weakly Supervised Object Localization CVPR 2021 Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds ICCV 2021 Free-Form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud ICCV 2021 InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds Through Instance Multi-Level Contextual Referring ICCV 2021 Temporal Modulation Network for Controllable Space-Time Video Super-Resolution CVPR 2021 PSSM-Distil: Protein Secondary Structure Prediction (PSSP) on Low-Quality PSSM by Knowledge Distillation with Contrastive Learning AAAI 2021 Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision ICML 2021 Local Representation is Not Enough: Soft Point-Wise Transformer for Descriptor and Detector of Local Features IJCAI 2021 PointLIE: Locally Invertible Embedding for Point Cloud Sampling and Recovery IJCAI 2021 Adaptive Residue-wise Profile Fusion for Low Homologous Protein Secondary Structure Prediction Using External Knowledge IJCAI 2021 PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling CVPR 2020 Exemplar Normalization for Learning Deep Representation CVPR 2020 CN-HIT-MI.T at SemEval-2020 Task 8: Memotion Analysis Based on BERT SEMEVAL 2020 Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism COLING 2020 CN-HIT-MI.T at SemEval-2020 Task 8: Memotion Analysis Based on BERT COLING 2020 Towards Content-Independent Multi-Reference Super-Resolution: Adaptive Pattern Matching and Feature Aggregation ECCV 2020 BARNet: Bilinear Attention Network with Adaptive Receptive Fields for Surgical Instrument Segmentation IJCAI 2020 Semi-Supervised Video Salient Object Detection Using Pseudo-Labels ICCV 2019 Feedback Network for Image Super-Resolution CVPR 2019 Deep Neural Nets with Interpolating Function as Output Activation NIPS 2018 Tux²: Distributed Graph Computation for Machine Learning NSDI 2017 Learning Deep Semantic Embeddings for Cross-Modal Retrieval ACML 2017 High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference ICCV 2017 Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks IJCAI 2016 Blockout: Dynamic Model Selection for Hierarchical Deep Networks CVPR 2016 Learning Semantic Relationships for Better Action Retrieval in Images CVPR 2015 Learning Locally-Adaptive Decision Functions for Person Verification CVPR 2013 Learning to Search Efficiently in High Dimensions NIPS 2011