Yansong Tang
62 papers · 2018–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Academic Marathon (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (8) π Cross-Pollinator (12)
π
Renaissance Researcher
(7)
πΊοΈ
Taxonomy Completionist
(91)
π
Interdisciplinary Bridge
π
Conference Loyalist
(25)
π¬
Deep Specialist
(18)
π€
Dynamic Duo
(24)
β‘
Prolific Year
(22)
π
Century Club
(62)
ποΈ
Keyword Collector
(267)
Conferences
CVPR (25)
ICCV (11)
ECCV (8)
NIPS (8)
AAAI (4)
ICLR (4)
ACL (1)
IJCAI (1)
Top co-authors
Keywords
semantic segmentation
(10)
vision-language model
(10)
video understanding
(8)
diffusion model
(6)
model compression
(5)
large language model
(4)
vision transformer
(4)
3d reconstruction
(4)
zero-shot learning
(4)
representation learning
(4)
object detection
(4)
multimodal learning
(4)
image segmentation
(3)
contrastive learning
(3)
action recognition
(3)
open-vocabulary segmentation
(3)
referring image segmentation
(3)
transfer learning
(3)
multi-task learning
(2)
post-training quantization
(2)
Papers
ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
ICCV 2025
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
ICLR 2025
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
AAAI 2025
Ponder & Press: Advancing Visual GUI Agent towards General Computer Control
ACL 2025
InstaRevive: One-Step Image Enhancement via Dynamic Score Matching
ICLR 2025
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
ICLR 2025
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
ICCV 2025
KV-Edit: Training-Free Image Editing for Precise Background Preservation
ICCV 2025
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
ICCV 2025
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
ICCV 2025
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
ICCV 2025
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
ICCV 2025
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
CVPR 2025
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
CVPR 2025
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
CVPR 2025
VoCo-LLaMA: Towards Vision Compression with Large Language Models
CVPR 2025
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
CVPR 2025
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
CVPR 2024
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
CVPR 2024
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
CVPR 2024
Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution
ECCV 2024
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation
NIPS 2024
GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling
NIPS 2024
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
ECCV 2024
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
ECCV 2024
"Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation"
ECCV 2024
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
ECCV 2024
Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
ECCV 2024
WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena
NIPS 2024
Q-VLM: Post-training Quantization for Large Vision-Language Models
NIPS 2024
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
CVPR 2024
Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding
AAAI 2024
CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding
AAAI 2024
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
CVPR 2024
FlowIE: Efficient Image Enhancement via Rectified Flow
CVPR 2024
Segment and Caption Anything
CVPR 2024
Universal Segmentation at Arbitrary Granularity with Language Instruction
CVPR 2024
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
CVPR 2024
Towards Accurate Post-training Quantization for Diffusion Models
CVPR 2024
HOI-aware Adaptive Network for Weakly-supervised Action Segmentation
IJCAI 2023
MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory
NIPS 2023
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
NIPS 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
AAAI 2023
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
CVPR 2023
FLAG3D: A 3D Fitness Activity Dataset With Language Instruction
CVPR 2023
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
ICCV 2023
Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning
ICCV 2023
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
ICCV 2023
Tem-Adapter: Adapting Image-Text Pretraining for Video Question Answer
ICCV 2023
GAIN: On the Generalization of Instructional Action Understanding
ICLR 2023
ScalableViT: Rethinking the Context-Oriented Generalization of Vision Transformer
ECCV 2022
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset
CVPR 2022
DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting
CVPR 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
CVPR 2022
Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning
CVPR 2022
BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion
CVPR 2022
Global Spectral Filter Memory Network for Video Object Segmentation
ECCV 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
NIPS 2022
OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
NIPS 2022
Uncertainty-Aware Score Distribution Learning for Action Quality Assessment
CVPR 2020
COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis
CVPR 2019
Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition
CVPR 2018