Haifeng Huang

23 papers · 2020–2025 · 11 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🏃 Academic Marathon (5) 🌍 Conference Polyglot (11) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)

🌈 Renaissance Researcher (9) 🌍 Conference Polyglot (11) 🏃 Academic Marathon (5) 🤝 Dynamic Duo (11) 🔬 Deep Specialist (11) 🧬 Topic Evolution 🏆 Keyword Champion (3) 🗃️ Keyword Collector (117) 🚀 Conference Pioneer 🔥 Unstoppable (6) 💎 Century Club (23) ⚡ Prolific Year (6)

Conferences

NIPS (5) AAAI (3) CVPR (3) ACL (2) EMNLP (2) ICCV (2) IJCAI (2) ICML (1) INTERSPEECH (1) MICCAI (1) NAACL (1)

Top co-authors

Zhou Zhao (11) Zehan Wang (10) Yang Zhao (9) Xize Cheng (6) Ziang Zhang (5) Chao Lu (5) Jun Chen (4) Tao Jin (4) Tai WANG (4) Yilun Chen (4)

Keywords

multi-modal learning (5) visual grounding (4) vision-language model (4) representation learning (4) large language model (4) 3d visual grounding (3) scene understanding (3) electronic medical record (3) attention mechanism (3) contrastive learning (2) medical diagnosis (2) robotic manipulation (2) graph neural network (2) cross-modal alignment (2) semantic alignment (2) visual question answering (2) point cloud (2) 3d vision (2) question answering (2) policy generalization (2)

Papers

Data-Efficiently Learn Large Language Model for Universal 3D Scene Perception NAACL 2025 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine AAAI 2025 Improving Retrieval Augmented Language Model with Self-Reasoning AAAI 2025 GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation CVPR 2025 RoboGround: Robotic Manipulation with Grounded Vision-Language Priors CVPR 2025 SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language CVPR 2025 Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning ICCV 2025 Extending Multi-modal Contrastive Representations NIPS 2024 FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion ICML 2024 MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations NIPS 2024 Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers NIPS 2024 Unified Audio Visual Cues for Target Speaker Extraction INTERSPEECH 2024 A Refer-and-Ground Multimodal Large Language Model for Biomedicine MICCAI 2024 Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding ICCV 2023 Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations ACL 2023 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding EMNLP 2023 Connecting Multi-modal Contrastive Representations NIPS 2023 A Speaker-Aware Co-Attention Framework for Medical Dialogue Information Extraction EMNLP 2022 Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization NIPS 2022 A Novel Sequence-to-Subgraph Framework for Diagnosis Classification IJCAI 2021 Towards Interpretable Clinical Diagnosis with Bayesian Network Ensembles Stacked on Entity-Aware CNNs ACL 2020 The Graph-based Mutual Attentive Network for Automatic Diagnosis IJCAI 2020 Generative Adversarial Regularized Mutual Information Policy Gradient Framework for Automatic Diagnosis AAAI 2020