Hang Xu
137 papers · 2018–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
🏃 Academic Marathon (7) 🌍 Conference Polyglot (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12)
🧭
Keyword Pioneer
🌈
Renaissance Researcher
(9)
🌍
Conference Polyglot
(13)
🏠
Conference Loyalist
(20)
🤝
Dynamic Duo
(61)
👑
Triple Crown
🏆
Grand Slam
👥
Mega-Team
(30)
🔬
Deep Specialist
(27)
🧬
Topic Evolution
🏆
Keyword Champion
(3)
📈
Trend Setter
🗃️
Keyword Collector
(444)
⚡
Prolific Year
(27)
🔥
Unstoppable
(8)
💎
Century Club
(134)
❓
The Questioner
(2)
Conferences
CVPR (29)
ICCV (24)
ECCV (20)
AAAI (19)
NIPS (16)
ICLR (14)
IJCAI (4)
EMNLP (3)
ACL (2)
ICML (2)
WACV (2)
NAACL (1)
RSS (1)
Top co-authors
Keywords
object detection
(23)
contrastive learning
(13)
neural architecture search
(11)
diffusion model
(10)
semantic segmentation
(9)
transfer learning
(8)
vision-language model
(7)
multimodal learning
(7)
autonomous driving
(6)
image generation
(6)
self-supervised learning
(6)
point cloud
(6)
domain adaptation
(5)
zero-shot classification
(5)
multi-task learning
(5)
3d object detection
(5)
cross-modal alignment
(5)
large language model
(5)
cross-modal learning
(4)
knowledge distillation
(4)
Papers
2D-CrossScan Mamba: Enhancing State Space Models with Spatially Consistent Multi-Path 2D Information Propagation
AAAI 2026
Deep (Predictive) Discounted Counterfactual Regret Minimization
AAAI 2026
MoEG-HOI: Mixture of Expert Groups for One-Stage Hand-Object Interaction Motion Generation with Hand-Finger-Joint Semantic Guidance
AAAI 2026
FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
ICLR 2025
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
ICLR 2025
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
ICCV 2025
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
ICCV 2025
FreeDNA: Endowing Domain Adaptation of Diffusion-Based Dense Prediction with Training-Free Domain Noise Alignment
ICCV 2025
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
ICCV 2025
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
ICLR 2025
Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data
NAACL 2025
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
CVPR 2025
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
CVPR 2025
Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
CVPR 2025
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
CVPR 2025
DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception
WACV 2025
Online Competitive Information Gathering for Partially Observable Trajectory Games
RSS 2025
LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement
AAAI 2024
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
AAAI 2024
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
ECCV 2024
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation
ACL 2024
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
ECCV 2024
JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
ECCV 2024
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
ECCV 2024
Implicit Concept Removal of Diffusion Models
ECCV 2024
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
ECCV 2024
"Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation"
ECCV 2024
Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent
IJCAI 2024
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
NIPS 2024
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
CVPR 2024
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
CVPR 2024
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
CVPR 2024
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
CVPR 2024
Rethinking Boundary Discontinuity Problem for Oriented Object Detection
CVPR 2024
Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
CVPR 2024
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
NIPS 2024
TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes
NIPS 2024
SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM
NIPS 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
NIPS 2024
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
ICLR 2024
Dynamic Discounted Counterfactual Regret Minimization
ICLR 2024
Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction
ICLR 2024
TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields
ICLR 2024
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
ECCV 2024
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
CVPR 2023
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
NIPS 2023
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
NIPS 2023
NLIP: Noise-Robust Language-Image Pre-training
AAAI 2023
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
AAAI 2023
Erratum to: 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
AAAI 2023
Mixed Autoencoder for Self-Supervised Visual Representation Learning
CVPR 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
CVPR 2023
ConQueR: Query Contrast Voxel-DETR for 3D Object Detection
CVPR 2023
Gaussian Label Distribution Learning for Spherical Image Object Detection
CVPR 2023
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
CVPR 2023
CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data
CVPR 2023
DetGPT: Detect What You Need via Reasoning
EMNLP 2023
PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection
ICCV 2023
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
ICCV 2023
Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach
ICCV 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
ICCV 2023
GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training
ICCV 2023
FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration
ICCV 2023
PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval
ICCV 2023
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment
ICCV 2023
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
ICCV 2023
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
ICLR 2023
CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving
ICLR 2023
Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts
ICLR 2023
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
ICLR 2023
SLAMB: Accelerated Large Batch Training with Sparse Communication
ICML 2023
Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods
IJCAI 2023
Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?
IJCAI 2023
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation
CVPR 2022
Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration
ACL 2022
AutoBERT-Zero: Evolving BERT Backbone from Scratch
AAAI 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
EMNLP 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
NIPS 2022
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving
NIPS 2022
AutoCFR: Learning to Design Counterfactual Regret Minimization Algorithms
AAAI 2022
Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing
AAAI 2022
Laneformer: Object-Aware Row-Column Transformers for Lane Detection
AAAI 2022
Unbiased IoU for Spherical Image Object Detection
AAAI 2022
FILIP: Fine-grained Interactive Language-Image Pre-Training
ICLR 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
ICLR 2022
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
NIPS 2022
ONCE-3DLanes: Building Monocular 3D Lane Detection
CVPR 2022
DevNet: Self-Supervised Monocular Depth Learning via Density Volume Construction
ECCV 2022
Point2Seq: Detecting 3D Objects As Sequences
CVPR 2022
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
CVPR 2022
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
CVPR 2022
PANDORA: A Panoramic Detection Dataset for Object with Orientation
ECCV 2022
MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection
ECCV 2022
Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
ECCV 2022
Learning Ego 3D Representation As Ray Tracing
ECCV 2022
Generative Negative Text Replay for Continual Vision-Language Pretraining
ECCV 2022
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
ECCV 2022
RCLane: Relay Chain Prediction for Lane Detection
ECCV 2022
Driver Anomaly Detection: A Dataset and Contrastive Learning Approach
WACV 2021
SOFT: Softmax-free Transformer with Linear Complexity
NIPS 2021
Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection
ICCV 2021
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-Guided Feature Imitation
ICCV 2021
DetCo: Unsupervised Contrastive Learning for Object Detection
ICCV 2021
Voxel Transformer for 3D Object Detection
ICCV 2021
Adversarial Robustness for Unsupervised Domain Adaptation
ICCV 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining
ICCV 2021
MultiSiam: Self-Supervised Multi-Instance Siamese Representation Learning for Autonomous Driving
ICCV 2021
NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models
ICCV 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
ICML 2021
Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search
ICLR 2021
DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning
NIPS 2021
Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation
AAAI 2021
Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation
CVPR 2021
Effective Sparsification of Neural Networks With Global Sparsity Constraint
CVPR 2021
TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search
CVPR 2021
How to Save your Annotation Cost for Panoptic Segmentation?
AAAI 2021
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
ICCV 2021
C3-SemiSeg: Contrastive Semi-Supervised Segmentation via Cross-Set Learning and Dynamic Class-Balancing
ICCV 2021
Segmenting Transparent Objects in the Wild with Transformer
IJCAI 2021
Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training
NIPS 2021
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
EMNLP 2021
EHSOD: CAM-Guided End-to-End Hybrid-Supervised Object Detection with Cascade Refinement
AAAI 2020
Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
NIPS 2020
Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
NIPS 2020
SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection
AAAI 2020
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search
ECCV 2020
AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling
ECCV 2020
JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image
ECCV 2020
SP-NAS: Serial-to-Parallel Backbone Search for Object Detection
CVPR 2020
CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending
ECCV 2020
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
AAAI 2020
ElixirNet: Relation-Aware Network Architecture Adaptation for Medical Lesion Detection
AAAI 2020
Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection
CVPR 2019
Spatial-Aware Graph Relation Network for Large-Scale Object Detection
CVPR 2019
Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification
ICCV 2019
Hybrid Knowledge Routed Modules for Large-scale Object Detection
NIPS 2018