Siyuan Huang
82 papers · 2017–2025 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π Academic Marathon (8) π Conference Polyglot (14) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (8)
π
Cross-Pollinator
(8)
π
Renaissance Researcher
(11)
πΊοΈ
Taxonomy Completionist
(115)
π
Conference Loyalist
(21)
π€
Dynamic Duo
(32)
π
Triple Crown
π₯
Mega-Team
(34)
π
Keyword Champion
(5)
π
Grand Slam
π¬
Deep Specialist
(19)
β‘
Prolific Year
(5)
π
Conference Pioneer
π₯
Unstoppable
(9)
ποΈ
Keyword Collector
(316)
π
Trend Setter
π
Century Club
(82)
Conferences
CVPR (21)
ICCV (15)
NIPS (9)
ECCV (8)
ICLR (8)
AAAI (4)
ACL (4)
CORL (4)
ICML (3)
EMNLP (2)
IJCAI (1)
IJCNLP (1)
RSS (1)
WACV (1)
Top co-authors
Research topics
Keywords
diffusion model
(6)
3d scene understanding
(5)
visual grounding
(4)
video understanding
(4)
imitation learning
(4)
robotic manipulation
(4)
symbolic reasoning
(4)
scene understanding
(4)
contrastive learning
(4)
3d reconstruction
(4)
zero-shot learning
(3)
scene reconstruction
(3)
multimodal learning
(3)
sim-to-real transfer
(3)
embodied ai
(3)
3d vision
(3)
human-object interaction
(3)
motion synthesis
(3)
object detection
(3)
question answering
(3)
Papers
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
ICCV 2025
Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation
CORL 2025
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models
CORL 2025
CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks
CORL 2025
Gumbel Reranking: Differentiable End-to-End Reranker Optimization
ACL 2025
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
ACL 2025
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
CVPR 2025
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
CVPR 2025
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans
CVPR 2025
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
CVPR 2025
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
CVPR 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
CVPR 2025
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CVPR 2025
Dynamic Motion Blending for Versatile Motion Editing
CVPR 2025
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
CVPR 2025
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
CVPR 2025
Training LLMs to be Better Text Embedders through Bidirectional Reconstruction
EMNLP 2025
PrimHOI: Compositional Human-Object Interaction via Reusable Primitives
ICCV 2025
Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing
ICCV 2025
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
ICCV 2025
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
ICCV 2025
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
ICCV 2025
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
ICLR 2025
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
ICLR 2025
RoboVerse: A Unified Platform, Benchmark and Dataset for Scalable and Generalizable Robot Learning
RSS 2025
VILLS : Video-Image Learning to Learn Semantics for Person Re-Identification
WACV 2025
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
ECCV 2024
Unifying 3D Vision-Language Understanding via Promptable Queries
ECCV 2024
"SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models"
ECCV 2024
Mirror-Consistency: Harnessing Inconsistency in Majority Voting
EMNLP 2024
Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
CVPR 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
ACL 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
3D Vision and Language Pretraining with Large-Scale Synthetic Data
IJCAI 2024
A3VLM: Actionable Articulation-Aware Vision Language Model
CORL 2024
Multi-modal Situated Reasoning in 3D Scenes
NIPS 2024
An Embodied Generalist Agent in 3D World
ICML 2024
Scaling Up Dynamic Human-Scene Interaction Modeling
CVPR 2024
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
CVPR 2024
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
CVPR 2024
Graph Parsing Networks
ICLR 2024
Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
NIPS 2024
PhyRecon: Physically Plausible Neural Scene Reconstruction
NIPS 2024
Neural-Symbolic Recursive Machine for Systematic Generalization
ICLR 2024
SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields
ECCV 2024
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
ECCV 2024
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
CVPR 2023
Improving Object-centric Learning with Query Optimization
ICLR 2023
SQA3D: Situated Question Answering in 3D Scenes
ICLR 2023
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics
ICLR 2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
NIPS 2023
Tailoring Self-Attention for Graph via Rooted Subtrees
NIPS 2023
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
ICCV 2023
ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes
ICCV 2023
Full-Body Articulated Human-Object Interaction
ICCV 2023
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners
CVPR 2023
Diffusion-Based Generation, Optimization, and Planning in 3D Scenes
CVPR 2023
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
NIPS 2022
Adversarial Texture for Fooling Person Detectors in the Physical World
CVPR 2022
Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion
AAAI 2022
Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World
CVPR 2022
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
NIPS 2022
YouRefIt: Embodied Reference Understanding With Language and Gesture
ICCV 2021
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
ACL 2021
Spatio-Temporal Self-Supervised Representation Learning for 3D Point Clouds
ICCV 2021
VLGrammar: Grounded Grammar Induction of Vision and Language
ICCV 2021
Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis
CVPR 2021
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
IJCNLP 2021
Learning by Fixing: Solving Math Word Problems with Weak Supervision
AAAI 2021
SMART: A Situation Model for Algebra Story Problems via Attributed Grammar
AAAI 2021
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning
ICML 2020
Streaming Batch Gradient Tracking for Neural Network Training (Student Abstract)
AAAI 2020
LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities
ECCV 2020
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
ECCV 2020
Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense
ICCV 2019
Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning
ICCV 2019
PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points
NIPS 2019
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
ECCV 2018
Human-Centric Indoor Scene Synthesis Using Stochastic Grammar
CVPR 2018
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation
NIPS 2018
Predicting Human Activities Using Stochastic Grammar
ICCV 2017