Yansong Tang

62 papers · 2018–2025 · 8 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (8) 🐝 Cross-Pollinator (12)

🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (91) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (25) 🔬 Deep Specialist (18) 🤝 Dynamic Duo (24) ⚡ Prolific Year (22) 💎 Century Club (62) 🗃️ Keyword Collector (267)

Conferences

CVPR (25) ICCV (11) ECCV (8) NIPS (8) AAAI (4) ICLR (4) ACL (1) IJCAI (1)

Top co-authors

Jiwen Lu (24) Jie Zhou (19) Yong Liu (10) Ziwei Wang (7) Philip H.S. Torr (6) Lei Chen (6) Yitong Wang (6) Wenliang Zhao (6) Xiu Li (5) Yixuan Zhu (5)

Keywords

semantic segmentation (10) vision-language model (10) video understanding (8) diffusion model (6) model compression (5) large language model (4) vision transformer (4) 3d reconstruction (4) zero-shot learning (4) representation learning (4) object detection (4) multimodal learning (4) image segmentation (3) contrastive learning (3) action recognition (3) open-vocabulary segmentation (3) referring image segmentation (3) transfer learning (3) multi-task learning (2) post-training quantization (2)

Papers

ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion ICCV 2025 WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct ICLR 2025 IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis AAAI 2025 Ponder & Press: Advancing Visual GUI Agent towards General Computer Control ACL 2025 InstaRevive: One-Step Image Enhancement via Dynamic Score Matching ICLR 2025 ThinkBot: Embodied Instruction Following with Thought Chain Reasoning ICLR 2025 Flash-VStream: Efficient Real-Time Understanding for Long Video Streams ICCV 2025 KV-Edit: Training-Free Image Editing for Precise Background Preservation ICCV 2025 GWM: Towards Scalable Gaussian World Models for Robotic Manipulation ICCV 2025 Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation ICCV 2025 AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation ICCV 2025 Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction ICCV 2025 SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes CVPR 2025 ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models CVPR 2025 Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model CVPR 2025 VoCo-LLaMA: Towards Vision Compression with Large Language Models CVPR 2025 FADE: Frequency-Aware Diffusion Model Factorization for Video Editing CVPR 2025 Narrative Action Evaluation with Prompt-Guided Multimodal Interaction CVPR 2024 PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild CVPR 2024 MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer CVPR 2024 Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution ECCV 2024 GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation NIPS 2024 GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling NIPS 2024 RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models ECCV 2024 MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model ECCV 2024 "Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation" ECCV 2024 ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation ECCV 2024 Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models ECCV 2024 WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena NIPS 2024 Q-VLM: Post-training Quantization for Large Vision-Language Models NIPS 2024 DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery CVPR 2024 Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding AAAI 2024 CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding AAAI 2024 Open-Vocabulary Segmentation with Semantic-Assisted Calibration CVPR 2024 FlowIE: Efficient Image Enhancement via Rectified Flow CVPR 2024 Segment and Caption Anything CVPR 2024 Universal Segmentation at Arbitrary Granularity with Language Instruction CVPR 2024 Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression CVPR 2024 Towards Accurate Post-training Quantization for Diffusion Models CVPR 2024 HOI-aware Adaptive Network for Weakly-supervised Action Segmentation IJCAI 2023 MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory NIPS 2023 SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation NIPS 2023 Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation AAAI 2023 LOGO: A Long-Form Video Dataset for Group Action Quality Assessment CVPR 2023 FLAG3D: A 3D Fitness Activity Dataset With Language Instruction CVPR 2023 Global Knowledge Calibration for Fast Open-Vocabulary Segmentation ICCV 2023 Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning ICCV 2023 FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation ICCV 2023 Tem-Adapter: Adapting Image-Text Pretraining for Video Question Answer ICCV 2023 GAIN: On the Generalization of Instructional Action Understanding ICLR 2023 ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer ECCV 2022 YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset CVPR 2022 DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting CVPR 2022 LAVT: Language-Aware Vision Transformer for Referring Image Segmentation CVPR 2022 Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning CVPR 2022 BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion CVPR 2022 Global Spectral Filter Memory Network for Video Object Segmentation ECCV 2022 HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions NIPS 2022 OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression NIPS 2022 Uncertainty-Aware Score Distribution Learning for Action Quality Assessment CVPR 2020 COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis CVPR 2019 Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition CVPR 2018