Heng Wang
51 papers · 2012–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (15) π Renaissance Researcher (5) π Interdisciplinary Bridge π Academic Marathon (14)
π
Academic Marathon
(14)
π
Cross-Pollinator
(12)
πΊοΈ
Taxonomy Completionist
(87)
π
Keyword Champion
(3)
π¬
Deep Specialist
(11)
π₯
Mega-Team
(22)
π§¬
Topic Evolution
π
Grand Slam
ποΈ
Keyword Collector
(221)
π
Trend Setter
π
Century Club
(49)
β
The Questioner
(4)
π₯
Unstoppable
(9)
β‘
Prolific Year
(5)
π
Conference Pioneer
Conferences
CVPR (9)
EMNLP (7)
AAAI (6)
ICCV (6)
WACV (5)
ICLR (4)
NIPS (3)
ACL (2)
ECCV (2)
ICML (2)
COLING (1)
IJCAI (1)
IJCNLP (1)
INTERSPEECH (1)
RSS (1)
Top co-authors
Keywords
action recognition
(7)
video classification
(6)
large language model
(6)
optical flow
(4)
zero-shot learning
(3)
multimodal learning
(3)
contrastive learning
(3)
3d convolutional network
(3)
graph neural network
(3)
natural language
(2)
image generation
(2)
temporal modeling
(2)
graph attention
(2)
node classification
(2)
visual reasoning
(2)
deep reinforcement learning
(2)
knowledge distillation
(2)
video generation
(2)
3d reconstruction
(2)
video recognition
(2)
Papers
Improving Implicit Discourse Relation Recognition with Natural Language Explanations from LLMs
AAAI 2026
Gotta Hear Them All: Towards Sound Source Aware Audio Generation
AAAI 2026
LVM-Lite: Training Large Vision Models with Efficient Sequential Modeling
WACV 2026
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
CVPR 2025
Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
WACV 2025
Autoregressive Pretraining with Mamba in Vision
ICLR 2025
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
ICLR 2025
Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities
ACL 2025
GL-GAN: Perceiving and Integrating Global and Local Styles for Handwritten Text Generation with Mamba
COLING 2025
Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents
EMNLP 2025
Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models
EMNLP 2025
VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos
WACV 2025
BannerAgency: Advertising Banner Design with Multimodal LLM Agents
EMNLP 2025
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
ICML 2025
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
ICLR 2025
VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
CVPR 2024
Stitching Segments and Sentences towards Generalization in Video-Text Pre-training
AAAI 2024
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
AAAI 2024
DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection
ACL 2024
Video Recognition in Portrait Mode
CVPR 2024
Enhancing Advanced Visual Reasoning Ability of Large Language Models
EMNLP 2024
Can LLM Graph Reasoning Generalize beyond Pattern Memorization?
EMNLP 2024
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
NIPS 2024
One Is All: Bridging the Gap between Neural Radiance Fields Architectures with Progressive Volume Distillation
AAAI 2023
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
ICCV 2023
PAniC-3D: Stylized Single-View 3D Reconstruction From Portraits of Anime Characters
CVPR 2023
Revisit Finetuning strategy for Few-Shot Learning to Transfer the Emdeddings
ICLR 2023
R2Former: Unified Retrieval and Reranking Transformer for Place Recognition
CVPR 2023
PointNeuron: 3D Neuron Reconstruction via Geometry and Topology Learning of Point Clouds
WACV 2023
Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks
EMNLP 2023
Can Language Models Solve Graph Problems in Natural Language?
NIPS 2023
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
CVPR 2022
TwiBot-22: Towards Graph-Based Twitter Bot Detection
NIPS 2022
A speech enhancement method for long-range speech acquisition task
INTERSPEECH 2022
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
IJCAI 2022
Is Space-Time Attention All You Need for Video Understanding?
ICML 2021
Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories
CVPR 2021
Interactive Prototype Learning for Egocentric Action Recognition
ICCV 2021
Searching for Two-Stream Models in Multivariate Space for Video Recognition
ICCV 2021
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
ICCV 2021
Devon: Deformable Volume Network for Learning Optical Flow
WACV 2020
FASTER Recurrent Networks for Efficient Video Classification
AAAI 2020
Proposal-based Video Completion
ECCV 2020
Video Modeling With Correlation Networks
CVPR 2020
Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforcement Learning
EMNLP 2019
Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforcement Learning
IJCNLP 2019
Video Classification With Channel-Separated Convolutional Networks
ICCV 2019
Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset
ECCV 2018
A Closer Look at Spatiotemporal Convolutions for Action Recognition
CVPR 2018
Action Recognition with Improved Trajectories
ICCV 2013
On the Structure of Nonlinearities in Pose Graph SLAM
RSS 2012