Heng Wang

51 papers · 2012–2026 · 15 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (15) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (14)

🏃 Academic Marathon (14) 🐝 Cross-Pollinator (12) 🗺️ Taxonomy Completionist (87) 🏆 Keyword Champion (3) 🔬 Deep Specialist (11) 👥 Mega-Team (22) 🧬 Topic Evolution 🏆 Grand Slam 🗃️ Keyword Collector (221) 📈 Trend Setter 💎 Century Club (49) ❓ The Questioner (4) 🔥 Unstoppable (9) ⚡ Prolific Year (5) 🚀 Conference Pioneer

Conferences

CVPR (9) EMNLP (7) AAAI (6) ICCV (6) WACV (5) ICLR (4) NIPS (3) ACL (2) ECCV (2) ICML (2) COLING (1) IJCAI (1) IJCNLP (1) INTERSPEECH (1) RSS (1)

Top co-authors

Du Tran (7) Matt Feiszli (7) Linjie Yang (7) Weidong Cai (7) Lorenzo Torresani (6) Zhaoxuan Tan (5) Shangbin Feng (5) Yi Yang (5) Minnan Luo (4) Chaoyi Zhang (4)

Keywords

action recognition (7) video classification (6) large language model (6) optical flow (4) zero-shot learning (3) multimodal learning (3) contrastive learning (3) 3d convolutional network (3) graph neural network (3) natural language (2) image generation (2) temporal modeling (2) graph attention (2) node classification (2) visual reasoning (2) deep reinforcement learning (2) knowledge distillation (2) video generation (2) 3d reconstruction (2) video recognition (2)

Papers

Improving Implicit Discourse Relation Recognition with Natural Language Explanations from LLMs AAAI 2026 Gotta Hear Them All: Towards Sound Source Aware Audio Generation AAAI 2026 LVM-Lite: Training Large Vision Models with Efficient Sequential Modeling WACV 2026 ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object CVPR 2025 Dance Any Beat: Blending Beats with Visuals in Dance Video Generation WACV 2025 Autoregressive Pretraining with Mamba in Vision ICLR 2025 Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos ICLR 2025 Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities ACL 2025 GL-GAN: Perceiving and Integrating Global and Local Styles for Handwritten Text Generation with Mamba COLING 2025 Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents EMNLP 2025 Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models EMNLP 2025 VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos WACV 2025 BannerAgency: Advertising Banner Design with Multimodal LLM Agents EMNLP 2025 CoSER: Coordinating LLM-Based Persona Simulation of Established Roles ICML 2025 HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing ICLR 2025 VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens CVPR 2024 Stitching Segments and Sentences towards Generalization in Video-Text Pre-training AAAI 2024 V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models AAAI 2024 DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection ACL 2024 Video Recognition in Portrait Mode CVPR 2024 Enhancing Advanced Visual Reasoning Ability of Large Language Models EMNLP 2024 Can LLM Graph Reasoning Generalize beyond Pattern Memorization? EMNLP 2024 Explaining Datasets in Words: Statistical Models with Natural Language Parameters NIPS 2024 One Is All: Bridging the Gap between Neural Radiance Fields Architectures with Progressive Volume Distillation AAAI 2023 Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels? ICCV 2023 PAniC-3D: Stylized Single-View 3D Reconstruction From Portraits of Anime Characters CVPR 2023 Revisit Finetuning strategy for Few-Shot Learning to Transfer the Emdeddings ICLR 2023 R2Former: Unified Retrieval and Reranking Transformer for Place Recognition CVPR 2023 PointNeuron: 3D Neuron Reconstruction via Geometry and Topology Learning of Point Clouds WACV 2023 Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks EMNLP 2023 Can Language Models Solve Graph Problems in Natural Language? NIPS 2023 Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity CVPR 2022 TwiBot-22: Towards Graph-Based Twitter Bot Detection NIPS 2022 A speech enhancement method for long-range speech acquisition task INTERSPEECH 2022 Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds IJCAI 2022 Is Space-Time Attention All You Need for Video Understanding? ICML 2021 Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories CVPR 2021 Interactive Prototype Learning for Egocentric Action Recognition ICCV 2021 Searching for Two-Stream Models in Multivariate Space for Video Recognition ICCV 2021 Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation ICCV 2021 Devon: Deformable Volume Network for Learning Optical Flow WACV 2020 FASTER Recurrent Networks for Efficient Video Classification AAAI 2020 Proposal-based Video Completion ECCV 2020 Video Modeling With Correlation Networks CVPR 2020 Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforcement Learning EMNLP 2019 Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforcement Learning IJCNLP 2019 Video Classification With Channel-Separated Convolutional Networks ICCV 2019 Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset ECCV 2018 A Closer Look at Spatiotemporal Convolutions for Action Recognition CVPR 2018 Action Recognition with Improved Trajectories ICCV 2013 On the Structure of Nonlinearities in Pose Graph SLAM RSS 2012