Hao Tang
142 papers · 2009–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (19) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (7) π£ Hot Topic Early Bird
π
Renaissance Researcher
(7)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(19)
π
Conference Loyalist
(30)
π¬
Deep Specialist
(21)
π
Triple Crown
π§¬
Topic Evolution
π
Keyword Champion
(2)
π
Grand Slam
π€
Dynamic Duo
(16)
π
Trend Setter
β
The Questioner
π
Conference Pioneer
β‘
Prolific Year
(28)
π₯
Unstoppable
(10)
ποΈ
Keyword Collector
(50)
π
Century Club
(135)
Conferences
CVPR (30)
AAAI (20)
ECCV (16)
INTERSPEECH (16)
NIPS (12)
WACV (10)
IJCAI (9)
ICCV (8)
ACL (6)
ICLR (5)
ICML (3)
NAACL (3)
CORL (2)
EMNLP (1)
MICCAI (1)
Top co-authors
Research topics
Keywords
self-supervised learning
(13)
attention mechanism
(9)
vision transformer
(9)
diffusion model
(8)
image generation
(7)
generative adversarial network
(7)
representation learning
(7)
semantic segmentation
(7)
model compression
(7)
medical image segmentation
(6)
few-shot learning
(6)
neural network
(5)
3d reconstruction
(5)
unsupervised learning
(5)
vision-language model
(5)
convolutional neural network
(5)
graph neural network
(5)
transformer architecture
(4)
generative model
(4)
contrastive learning
(4)
Papers
TR-DQ: Time-Rotation Diffusion Quantization
AAAI 2026
Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models
AAAI 2026
FourierPET: Deep Fourier-based Unrolled Network for Low-count PET Reconstruction
AAAI 2026
Fine-Grained Image Retrieval via Dual-Vision Adaptation
AAAI 2026
ICM-Fusion: In-Context Meta-Optimized LoRA Fusion for Multi-Task Adaptation
AAAI 2026
MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging
AAAI 2026
IMAGGarment+: Efficient Attribute-Wise Diffusion for Garment Generation
AAAI 2026
3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation
CORL 2025
LLM-Guided Probabilistic Program Induction for POMDP Model Estimation
CORL 2025
SMDAF: A Scalable Sidewalk Material Data Acquisition Framework with Bidirectional Cross-Modal Knowledge Distillation
WACV 2025
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation
NAACL 2025
From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks
NAACL 2025
Marker-less Head Pose Tracking for Image-guided Cerebral Artery Navigation
MICCAI 2025
Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning
IJCAI 2025
In-Context Meta LoRA Generation
IJCAI 2025
ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling
AAAI 2025
A Training-free Synthetic Data Selection Method for Semantic Segmentation
AAAI 2025
Stable-Hair: Real-World Hair Transfer via Diffusion Model
AAAI 2025
Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
AAAI 2025
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
AAAI 2025
Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution
IJCAI 2025
OT-DETECTOR: Delving into Optimal Transport for Zero-shot Out-of-Distribution Detection
IJCAI 2025
FairSMOE: Mitigating Multi-Attribute Fairness Problem with Sparse Mixture-of-Experts
IJCAI 2025
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
ICML 2025
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
ICLR 2025
Combining Induction and Transduction for Abstract Reasoning
ICLR 2025
Similarity Memory Prior is All You Need for Medical Image Segmentation
ICCV 2025
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
ICCV 2025
MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
ICCV 2025
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
CVPR 2025
MambaIC: State Space Models for High-Performance Learned Image Compression
CVPR 2025
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
CVPR 2025
DiffFNO: Diffusion Fourier Neural Operator
CVPR 2025
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
CVPR 2025
Q-TempFusion: Quantization-Aware Temporal Multi-Sensor Fusion on Bird's-Eye View Representation
WACV 2025
Distilling ODE Solvers of Diffusion Models into Smaller Steps
CVPR 2024
Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection
NIPS 2024
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment
NIPS 2024
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
NIPS 2024
Delving into Multimodal Prompting for Fine-Grained Visual Classification
AAAI 2024
G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model
AAAI 2024
Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency
CVPR 2024
ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
CVPR 2024
Towards Robust 3D Pose Transfer with Adversarial Learning
CVPR 2024
On the Faithfulness of Vision Transformer Explanations
CVPR 2024
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
CVPR 2024
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
CVPR 2024
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
CVPR 2024
Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy
CVPR 2024
Motion Mamba: Efficient and Long Sequence Motion Generation
ECCV 2024
Dataset Growth
ECCV 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
ECCV 2024
SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis
ECCV 2024
3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
ECCV 2024
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
ECCV 2024
ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
ECCV 2024
3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
ECCV 2024
InstructGIE: Towards Generalizable Image Editing
ECCV 2024
Efficient-3Dim: Learning a Generalizable Single-image Novel-view Synthesizer in One Day
ICLR 2024
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
INTERSPEECH 2024
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
INTERSPEECH 2024
Mining and Unifying Heterogeneous Contrastive Relations for Weakly-Supervised Actor-Action Segmentation
WACV 2024
Bipartite Graph Diffusion Model for Human Interaction Generation
WACV 2024
Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces
INTERSPEECH 2023
Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased
ICLR 2023
LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer
NIPS 2023
Attributable and Scalable Opinion Summarization
ACL 2023
SpeedDETR: Speed-aware Transformers for End-to-end Object Detection
ICML 2023
From Perception to Programs: Regularize, Overparameterize, and Amortize
ICML 2023
Towards Real-Time Segmentation on the Edge
AAAI 2023
RZCR: Zero-shot Character Recognition via Radical-based Reasoning
IJCAI 2023
Data Level Lottery Ticket Hypothesis for Vision Transformers
IJCAI 2023
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
AAAI 2023
HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception
NIPS 2023
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer
CVPR 2023
HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design
AAAI 2023
DE-net: Dynamic Text-Guided Image Editing Adversarial Networks
AAAI 2023
Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge
CVPR 2023
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
CVPR 2023
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
CVPR 2023
Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
CVPR 2023
SMAE: Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders
CVPR 2023
Graph Transformer GANs for Graph-Constrained House Generation
CVPR 2023
PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile
NIPS 2023
Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling
INTERSPEECH 2023
Diffeomorphic Image Registration With Neural Velocity Field
WACV 2023
Few-Shot Medical Image Segmentation With Cycle-Resemblance Attention
WACV 2023
Representation Recovering for Self-Supervised Pre-Training on Medical Images
WACV 2023
EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset
NIPS 2023
Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations
NIPS 2023
Does Graph Distillation See Like Vision Dataset Counterpart?
NIPS 2023
Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification
ICCV 2023
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
ICCV 2023
Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis
ICLR 2023
Phonetic Analysis of Self-supervised Representations of English Speech
INTERSPEECH 2022
Autoregressive Co-Training for Learning Discrete Speech Representation
INTERSPEECH 2022
Hierarchical Sketch Induction for Paraphrase Generation
ACL 2022
Towards Interpretable Video Super-Resolution via Alternating Optimization
ECCV 2022
Compiler-Aware Neural Architecture Search for On-Mobile Real-Time Super-Resolution
ECCV 2022
Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation
ECCV 2022
Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow
CVPR 2022
Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling
CVPR 2022
AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation
WACV 2022
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
CVPR 2022
Learning To Restore 3D Face From In-the-Wild Degraded Images
CVPR 2022
Geometry-Contrastive Transformer for Generalized 3D Pose Transfer
AAAI 2022
Real-Time Portrait Stylization on the Edge
IJCAI 2022
Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech
INTERSPEECH 2022
Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking
AAAI 2022
PPT: Token-Pruned Pose Transformer for Monocular and Multi-View Human Pose Estimation
ECCV 2022
SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning
ECCV 2022
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
CVPR 2022
3D-Aware Semantic-Guided Generative Model for Human Synthesis
ECCV 2022
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
CVPR 2022
Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction
ICCV 2021
On the Difficulty of Segmenting Words with Attention
EMNLP 2021
Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer
ICCV 2021
Recurrent Mask Refinement for Few-Shot Medical Image Segmentation
ICCV 2021
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
WACV 2021
Spatial Context-Aware Self-Attention Model for Multi-Organ Segmentation
WACV 2021
Vector-Quantized Autoregressive Predictive Coding
INTERSPEECH 2020
AMR Parsing with Latent Structural Information
ACL 2020
Dependency Graph Enhanced Dual-transformer Structure for Aspect-based Sentiment Classification
ACL 2020
XingGAN for Person Image Generation
ECCV 2020
Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation
CVPR 2020
Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs
NIPS 2020
Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals
NIPS 2020
Belief Propagation Neural Networks
NIPS 2020
A Deep Residual Network for Large-Scale Acoustic Scene Analysis
INTERSPEECH 2019
An Unsupervised Autoregressive Model for Speech Representation Learning
INTERSPEECH 2019
Relating Simple Sentence Representations in Deep Neural Networks and the Brain
ACL 2019
Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation
CVPR 2019
VoiceID Loss: Speech Enhancement for Speaker Verification
INTERSPEECH 2019
Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation
CVPR 2018
A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
INTERSPEECH 2018
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
INTERSPEECH 2018
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition
INTERSPEECH 2017
A Novel Feature Matching Strategy for Large Scale Image Retrieval
IJCAI 2016
Efficient Segmental Cascades for Speech Recognition
INTERSPEECH 2016
Triphone State-Tying via Deep Canonical Correlation Analysis
INTERSPEECH 2016
Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
ACL 2012
Spherical Discriminant Analysis in Semi-supervised Speaker Clustering
NAACL 2009