Hao Tang

142 papers · 2009–2026 · 15 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🗺️ Taxonomy Completionist (19) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (19) 🏠 Conference Loyalist (30) 🔬 Deep Specialist (21) 👑 Triple Crown 🧬 Topic Evolution 🏆 Keyword Champion (2) 🏆 Grand Slam 🤝 Dynamic Duo (16) 📈 Trend Setter ❓ The Questioner 🚀 Conference Pioneer ⚡ Prolific Year (28) 🔥 Unstoppable (10) 🗃️ Keyword Collector (50) 💎 Century Club (135)

Conferences

CVPR (30) AAAI (20) ECCV (16) INTERSPEECH (16) NIPS (12) WACV (10) IJCAI (9) ICCV (8) ACL (6) ICLR (5) ICML (3) NAACL (3) CORL (2) EMNLP (1) MICCAI (1)

Top co-authors

Yanzhi Wang (16) Nicu Sebe (13) Luc Van Gool (9) Yan Yan (9) Geng Yuan (9) Xiaohui Xie (8) Pu Zhao (7) Xuan Shen (7) Peiyan Dong (7) Zhenglun Kong (7)

Research topics

Recognition (1) Optimization (1)

Keywords

self-supervised learning (13) attention mechanism (9) vision transformer (9) diffusion model (8) image generation (7) generative adversarial network (7) representation learning (7) semantic segmentation (7) model compression (7) medical image segmentation (6) few-shot learning (6) neural network (5) 3d reconstruction (5) unsupervised learning (5) vision-language model (5) convolutional neural network (5) graph neural network (5) transformer architecture (4) generative model (4) contrastive learning (4)

Papers

TR-DQ: Time-Rotation Diffusion Quantization AAAI 2026 Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models AAAI 2026 FourierPET: Deep Fourier-based Unrolled Network for Low-count PET Reconstruction AAAI 2026 Fine-Grained Image Retrieval via Dual-Vision Adaptation AAAI 2026 ICM-Fusion: In-Context Meta-Optimized LoRA Fusion for Multi-Task Adaptation AAAI 2026 MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging AAAI 2026 IMAGGarment+: Efficient Attribute-Wise Diffusion for Garment Generation AAAI 2026 3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation CORL 2025 LLM-Guided Probabilistic Program Induction for POMDP Model Estimation CORL 2025 SMDAF: A Scalable Sidewalk Material Data Acquisition Framework with Bidirectional Cross-Modal Knowledge Distillation WACV 2025 MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation NAACL 2025 From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks NAACL 2025 Marker-less Head Pose Tracking for Image-guided Cerebral Artery Navigation MICCAI 2025 Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning IJCAI 2025 In-Context Meta LoRA Generation IJCAI 2025 ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling AAAI 2025 A Training-free Synthetic Data Selection Method for Semantic Segmentation AAAI 2025 Stable-Hair: Real-World Hair Transfer via Diffusion Model AAAI 2025 Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition AAAI 2025 Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment AAAI 2025 Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution IJCAI 2025 OT-DETECTOR: Delving into Optimal Transport for Zero-shot Out-of-Distribution Detection IJCAI 2025 FairSMOE: Mitigating Multi-Attribute Fairness Problem with Sparse Mixture-of-Experts IJCAI 2025 Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention ICML 2025 VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning ICLR 2025 Combining Induction and Transduction for Abstract Reasoning ICLR 2025 Similarity Memory Prior is All You Need for Medical Image Segmentation ICCV 2025 DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding ICCV 2025 MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation ICCV 2025 Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass CVPR 2025 MambaIC: State Space Models for High-Performance Learned Image Compression CVPR 2025 HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models CVPR 2025 DiffFNO: Diffusion Fourier Neural Operator CVPR 2025 PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model CVPR 2025 Q-TempFusion: Quantization-Aware Temporal Multi-Sensor Fusion on Bird's-Eye View Representation WACV 2025 Distilling ODE Solvers of Diffusion Models into Smaller Steps CVPR 2024 Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection NIPS 2024 WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment NIPS 2024 Code Repair with LLMs gives an Exploration-Exploitation Tradeoff NIPS 2024 Delving into Multimodal Prompting for Fine-Grained Visual Classification AAAI 2024 G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model AAAI 2024 Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency CVPR 2024 ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization CVPR 2024 Towards Robust 3D Pose Transfer with Adversarial Learning CVPR 2024 On the Faithfulness of Vision Transformer Explanations CVPR 2024 HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud CVPR 2024 SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation CVPR 2024 Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer CVPR 2024 Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy CVPR 2024 Motion Mamba: Efficient and Long Sequence Motion Generation ECCV 2024 Dataset Growth ECCV 2024 GiT: Towards Generalist Vision Transformer through Universal Language Interface ECCV 2024 SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis ECCV 2024 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences ECCV 2024 StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion ECCV 2024 ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation ECCV 2024 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance ECCV 2024 InstructGIE: Towards Generalizable Image Editing ECCV 2024 Efficient-3Dim: Learning a Generalizable Single-image Novel-view Synthesizer in One Day ICLR 2024 Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations INTERSPEECH 2024 DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models INTERSPEECH 2024 Mining and Unifying Heterogeneous Contrastive Relations for Weakly-Supervised Actor-Action Segmentation WACV 2024 Bipartite Graph Diffusion Model for Human Interaction Generation WACV 2024 Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces INTERSPEECH 2023 Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased ICLR 2023 LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer NIPS 2023 Attributable and Scalable Opinion Summarization ACL 2023 SpeedDETR: Speed-aware Transformers for End-to-end Object Detection ICML 2023 From Perception to Programs: Regularize, Overparameterize, and Amortize ICML 2023 Towards Real-Time Segmentation on the Edge AAAI 2023 RZCR: Zero-shot Character Recognition via Radical-based Reasoning IJCAI 2023 Data Level Lottery Ticket Hypothesis for Vision Transformers IJCAI 2023 Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training AAAI 2023 HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception NIPS 2023 Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer CVPR 2023 HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design AAAI 2023 DE-net: Dynamic Text-Guided Image Editing Adversarial Networks AAAI 2023 Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge CVPR 2023 DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network CVPR 2023 GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis CVPR 2023 Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration CVPR 2023 SMAE: Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders CVPR 2023 Graph Transformer GANs for Graph-Constrained House Generation CVPR 2023 PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile NIPS 2023 Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling INTERSPEECH 2023 Diffeomorphic Image Registration With Neural Velocity Field WACV 2023 Few-Shot Medical Image Segmentation With Cycle-Resemblance Attention WACV 2023 Representation Recovering for Self-Supervised Pre-Training on Medical Images WACV 2023 EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset NIPS 2023 Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations NIPS 2023 Does Graph Distillation See Like Vision Dataset Counterpart? NIPS 2023 Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification ICCV 2023 UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation ICCV 2023 Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis ICLR 2023 Phonetic Analysis of Self-supervised Representations of English Speech INTERSPEECH 2022 Autoregressive Co-Training for Learning Discrete Speech Representation INTERSPEECH 2022 Hierarchical Sketch Induction for Paraphrase Generation ACL 2022 Towards Interpretable Video Super-Resolution via Alternating Optimization ECCV 2022 Compiler-Aware Neural Architecture Search for On-Mobile Real-Time Super-Resolution ECCV 2022 Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation ECCV 2022 Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow CVPR 2022 Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling CVPR 2022 AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation WACV 2022 DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis CVPR 2022 Learning To Restore 3D Face From In-the-Wild Degraded Images CVPR 2022 Geometry-Contrastive Transformer for Generalized 3D Pose Transfer AAAI 2022 Real-Time Portrait Stylization on the Edge IJCAI 2022 Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech INTERSPEECH 2022 Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking AAAI 2022 PPT: Token-Pruned Pose Transformer for Monocular and Multi-View Human Pose Estimation ECCV 2022 SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning ECCV 2022 MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation CVPR 2022 3D-Aware Semantic-Guided Generative Model for Human Synthesis ECCV 2022 Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model CVPR 2022 Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction ICCV 2021 On the Difficulty of Segmenting Words with Attention EMNLP 2021 Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer ICCV 2021 Recurrent Mask Refinement for Few-Shot Medical Image Segmentation ICCV 2021 Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention WACV 2021 Spatial Context-Aware Self-Attention Model for Multi-Organ Segmentation WACV 2021 Vector-Quantized Autoregressive Predictive Coding INTERSPEECH 2020 AMR Parsing with Latent Structural Information ACL 2020 Dependency Graph Enhanced Dual-transformer Structure for Aspect-based Sentiment Classification ACL 2020 XingGAN for Person Image Generation ECCV 2020 Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation CVPR 2020 Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs NIPS 2020 Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals NIPS 2020 Belief Propagation Neural Networks NIPS 2020 A Deep Residual Network for Large-Scale Acoustic Scene Analysis INTERSPEECH 2019 An Unsupervised Autoregressive Model for Speech Representation Learning INTERSPEECH 2019 Relating Simple Sentence Representations in Deep Neural Networks and the Brain ACL 2019 Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation CVPR 2019 VoiceID Loss: Speech Enhancement for Speaker Verification INTERSPEECH 2019 Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation CVPR 2018 A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition INTERSPEECH 2018 Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition INTERSPEECH 2018 Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition INTERSPEECH 2017 A Novel Feature Matching Strategy for Large Scale Image Retrieval IJCAI 2016 Efficient Segmental Cascades for Speech Recognition INTERSPEECH 2016 Triphone State-Tying via Deep Canonical Correlation Analysis INTERSPEECH 2016 Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach ACL 2012 Spherical Discriminant Analysis in Semi-supervised Speaker Clustering NAACL 2009