Hang Xu

137 papers · 2018–2026 · 13 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🏃 Academic Marathon (7) 🌍 Conference Polyglot (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12)

🧭 Keyword Pioneer 🌈 Renaissance Researcher (9) 🌍 Conference Polyglot (13) 🏠 Conference Loyalist (20) 🤝 Dynamic Duo (61) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (30) 🔬 Deep Specialist (27) 🧬 Topic Evolution 🏆 Keyword Champion (3) 📈 Trend Setter 🗃️ Keyword Collector (444) ⚡ Prolific Year (27) 🔥 Unstoppable (8) 💎 Century Club (134) ❓ The Questioner (2)

Conferences

CVPR (29) ICCV (24) ECCV (20) AAAI (19) NIPS (16) ICLR (14) IJCAI (4) EMNLP (3) ACL (2) ICML (2) WACV (2) NAACL (1) RSS (1)

Top co-authors

Xiaodan Liang (61) Zhenguo Li (45) Jianhua Han (36) Wei Zhang (33) Lanqing Hong (15) Chunjing XU (14) Guansong Lu (12) Lewei Yao (11) Runhui Huang (10) Songcen Xu (10)

Keywords

object detection (23) contrastive learning (13) neural architecture search (11) diffusion model (10) semantic segmentation (9) transfer learning (8) vision-language model (7) multimodal learning (7) autonomous driving (6) image generation (6) self-supervised learning (6) point cloud (6) domain adaptation (5) zero-shot classification (5) multi-task learning (5) 3d object detection (5) cross-modal alignment (5) large language model (5) cross-modal learning (4) knowledge distillation (4)

Papers

2D-CrossScan Mamba: Enhancing State Space Models with Spatially Consistent Multi-Path 2D Information Propagation AAAI 2026 Deep (Predictive) Discounted Counterfactual Regret Minimization AAAI 2026 MoEG-HOI: Mixture of Expert Groups for One-Stage Hand-Object Interaction Motion Generation with Hand-Finger-Joint Semantic Guidance AAAI 2026 FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise ICLR 2025 UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting ICLR 2025 FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors ICCV 2025 VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning ICCV 2025 FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment ICCV 2025 ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance ICCV 2025 G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model ICLR 2025 Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data NAACL 2025 EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions CVPR 2025 HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models CVPR 2025 ACE: Anti-Editing Concept Erasure in Text-to-Image Models CVPR 2025 Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution CVPR 2025 EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation CVPR 2025 DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception WACV 2025 Online Competitive Information Gathering for Partially Observable Trajectory Games RSS 2025 LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement AAAI 2024 Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images AAAI 2024 PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion ECCV 2024 CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation ACL 2024 HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance ECCV 2024 JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation ECCV 2024 Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving ECCV 2024 Implicit Concept Removal of Diffusion Models ECCV 2024 MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing ECCV 2024 "Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation" ECCV 2024 Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent IJCAI 2024 VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation NIPS 2024 DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior CVPR 2024 Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models CVPR 2024 BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models CVPR 2024 DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection CVPR 2024 Rethinking Boundary Discontinuity Problem for Oriented Object Detection CVPR 2024 Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution CVPR 2024 PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation NIPS 2024 TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes NIPS 2024 SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM NIPS 2024 UNIT: Unifying Image and Text Recognition in One Vision Encoder NIPS 2024 Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis ICLR 2024 Dynamic Discounted Counterfactual Regret Minimization ICLR 2024 Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction ICLR 2024 TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields ICLR 2024 LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model ECCV 2024 CapDet: Unifying Dense Captioning and Open-World Detection Pretraining CVPR 2023 OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping NIPS 2023 CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection NIPS 2023 NLIP: Noise-Robust Language-Image Pre-training AAAI 2023 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation AAAI 2023 Erratum to: 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation AAAI 2023 Mixed Autoencoder for Self-Supervised Visual Representation Learning CVPR 2023 DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment CVPR 2023 ConQueR: Query Contrast Voxel-DETR for 3D Object Detection CVPR 2023 Gaussian Label Distribution Learning for Spherical Image Object Detection CVPR 2023 Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving CVPR 2023 CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data CVPR 2023 DetGPT: Detect What You Need via Reasoning EMNLP 2023 PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection ICCV 2023 MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation ICCV 2023 Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach ICCV 2023 DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability ICCV 2023 GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training ICCV 2023 FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration ICCV 2023 PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval ICCV 2023 DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment ICCV 2023 Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images ICCV 2023 ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency ICLR 2023 CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving ICLR 2023 Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts ICLR 2023 Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning ICLR 2023 SLAMB: Accelerated Large Batch Training with Sparse Communication ICML 2023 Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods IJCAI 2023 Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents? IJCAI 2023 ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation CVPR 2022 Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration ACL 2022 AutoBERT-Zero: Evolving BERT Backbone from Scratch AAAI 2022 ZeroGen: Efficient Zero-shot Learning via Dataset Generation EMNLP 2022 Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark NIPS 2022 Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving NIPS 2022 AutoCFR: Learning to Design Counterfactual Regret Minimization Algorithms AAAI 2022 Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing AAAI 2022 Laneformer: Object-Aware Row-Column Transformers for Lane Detection AAAI 2022 Unbiased IoU for Spherical Image Object Detection AAAI 2022 FILIP: Fine-grained Interactive Language-Image Pre-Training ICLR 2022 Revisiting Over-smoothing in BERT from the Perspective of Graph ICLR 2022 DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection NIPS 2022 ONCE-3DLanes: Building Monocular 3D Lane Detection CVPR 2022 DevNet: Self-Supervised Monocular Depth Learning via Density Volume Construction ECCV 2022 Point2Seq: Detecting 3D Objects As Sequences CVPR 2022 Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search CVPR 2022 Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism CVPR 2022 PANDORA: A Panoramic Detection Dataset for Object with Orientation ECCV 2022 MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection ECCV 2022 Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding ECCV 2022 Learning Ego 3D Representation As Ray Tracing ECCV 2022 Generative Negative Text Replay for Continual Vision-Language Pretraining ECCV 2022 CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving ECCV 2022 RCLane: Relay Chain Prediction for Lane Detection ECCV 2022 Driver Anomaly Detection: A Dataset and Contrastive Learning Approach WACV 2021 SOFT: Softmax-free Transformer with Linear Complexity NIPS 2021 Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection ICCV 2021 G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation ICCV 2021 DetCo: Unsupervised Contrastive Learning for Object Detection ICCV 2021 Voxel Transformer for 3D Object Detection ICCV 2021 Adversarial Robustness for Unsupervised Domain Adaptation ICCV 2021 Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining ICCV 2021 MultiSiam: Self-Supervised Multi-Instance Siamese Representation Learning for Autonomous Driving ICCV 2021 NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models ICCV 2021 SparseBERT: Rethinking the Importance Analysis in Self-attention ICML 2021 Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search ICLR 2021 DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning NIPS 2021 Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation AAAI 2021 Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation CVPR 2021 Effective Sparsification of Neural Networks With Global Sparsity Constraint CVPR 2021 TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search CVPR 2021 How to Save your Annotation Cost for Panoptic Segmentation? AAAI 2021 Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection ICCV 2021 C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing ICCV 2021 Segmenting Transparent Objects in the Wild with Transformer IJCAI 2021 Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training NIPS 2021 EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation EMNLP 2021 EHSOD: CAM-Guided End-to-End Hybrid-Supervised Object Detection with Cascade Refinement AAAI 2020 Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation NIPS 2020 Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS NIPS 2020 SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection AAAI 2020 CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search ECCV 2020 AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling ECCV 2020 JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image ECCV 2020 SP-NAS: Serial-to-Parallel Backbone Search for Object Detection CVPR 2020 CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending ECCV 2020 Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN AAAI 2020 ElixirNet: Relation-Aware Network Architecture Adaptation for Medical Lesion Detection AAAI 2020 Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection CVPR 2019 Spatial-Aware Graph Relation Network for Large-Scale Object Detection CVPR 2019 Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification ICCV 2019 Hybrid Knowledge Routed Modules for Large-scale Object Detection NIPS 2018