Jianhua Han

37 papers · 2017–2025 · 10 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🐝 Cross-Pollinator (12) 🧭 Keyword Pioneer 🏃 Academic Marathon (8) 🌍 Conference Polyglot (10) 🌈 Renaissance Researcher (5)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (48) 🧭 Keyword Pioneer 🤝 Dynamic Duo (36) 🏆 Keyword Champion (2) 👥 Mega-Team (30) ⚡ Prolific Year (14) 💎 Century Club (37) 🗃️ Keyword Collector (139)

Conferences

CVPR (9) ECCV (8) NIPS (5) AAAI (4) ICLR (4) ICCV (3) ACL (1) EMNLP (1) IJCAI (1) WACV (1)

Top co-authors

Hang Xu (36) Xiaodan Liang (22) Wei Zhang (14) Zhenguo Li (11) Chunjing XU (8) Lanqing Hong (8) Runhui Huang (7) Chunwei Wang (7) Kai Chen (6) Dit-Yan Yeung (5)

Keywords

object detection (6) multimodal learning (5) large language model (5) vision-language model (5) zero-shot detection (4) autonomous driving (4) contrastive learning (4) semantic segmentation (3) image generation (3) zero-shot learning (3) zero-shot classification (3) transfer learning (3) cross-modal alignment (2) vision language model (2) multimodal large language model (2) multi-modal learning (2) lane detection (2) video understanding (2) self-supervised learning (2) multi-task learning (2)

Papers

DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception WACV 2025 HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models CVPR 2025 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions CVPR 2025 ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance ICCV 2025 G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model ICLR 2025 CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation ACL 2024 VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation NIPS 2024 SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM NIPS 2024 Implicit Concept Removal of Diffusion Models ECCV 2024 Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving ECCV 2024 HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance ECCV 2024 PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion ECCV 2024 LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model ECCV 2024 Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction ICLR 2024 UNIT: Unifying Image and Text Recognition in One Vision Encoder NIPS 2024 Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models CVPR 2024 DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection CVPR 2024 Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis ICLR 2024 Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images AAAI 2024 DetGPT: Detect What You Need via Reasoning EMNLP 2023 GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training ICCV 2023 Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts ICLR 2023 Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving CVPR 2023 CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data CVPR 2023 CapDet: Unifying Dense Captioning and Open-World Detection Pretraining CVPR 2023 NLIP: Noise-Robust Language-Image Pre-training AAAI 2023 DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment CVPR 2023 DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability ICCV 2023 CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving ECCV 2022 Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing AAAI 2022 ONCE-3DLanes: Building Monocular 3D Lane Detection CVPR 2022 DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection NIPS 2022 Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding ECCV 2022 Generative Negative Text Replay for Continual Vision-Language Pretraining ECCV 2022 Laneformer: Object-Aware Row-Column Transformers for Lane Detection AAAI 2022 Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving NIPS 2022 Aggregating Crowd Wisdoms with Label-aware Autoencoders IJCAI 2017