Tong He

78 papers · 2017–2025 · 11 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🌍 Conference Polyglot (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🏃 Academic Marathon (8)

🏃 Academic Marathon (8) 🐝 Cross-Pollinator (12) 🗺️ Taxonomy Completionist (101) 🏠 Conference Loyalist (22) 🤝 Dynamic Duo (22) 👥 Mega-Team (20) 🏆 Grand Slam 🔬 Deep Specialist (13) 🚀 Conference Pioneer 🗃️ Keyword Collector (282) 💎 Century Club (78) 📈 Trend Setter 🔥 Unstoppable (9) ❓ The Questioner ⚡ Prolific Year (6)

Conferences

CVPR (22) NIPS (13) ICCV (12) ICLR (11) ECCV (9) AAAI (4) AISTATS (2) ICML (2) ACL (1) CORL (1) JMLR (1)

Top co-authors

Wanli Ouyang (22) Zheng Zhang (16) Tianjun Xiao (14) Di Huang (12) Chunhua Shen (12) Weicai Ye (8) Yanwei Fu (7) Stefano Soatto (6) Honghui Yang (6) Yu Qiao (6)

Research topics

Robotics (1)

Keywords

point cloud (10) self-supervised learning (9) semantic segmentation (7) 3d object detection (7) representation learning (6) autonomous driving (6) 3d vision (6) object detection (5) convolutional neural network (4) transfer learning (4) knowledge distillation (4) depth estimation (3) graph neural network (3) novel view synthesis (3) slot attention (3) generative model (3) attention mechanism (3) diffusion model (3) point cloud processing (3) amodal segmentation (3)

Papers

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction ICLR 2025 Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction ICLR 2025 Common Learning Constraints Alter Interpretations of Direct Preference Optimization AISTATS 2025 Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation ICLR 2025 MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers ICLR 2025 ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs ICLR 2025 VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers ICCV 2025 Aether: Geometric-Aware Unified World Modeling ICCV 2025 EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds ICCV 2025 S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation CVPR 2025 Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning CVPR 2025 GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving CVPR 2025 Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach ICLR 2025 SPA: 3D Spatial-Awareness Enables Effective Embodied Representation ICLR 2025 Depth Any Video with Scalable Synthetic Data ICLR 2025 Sparse Autoencoders, Again? ICML 2025 Explicit Preference Optimization: No Need for an Implicit Reward Model ICML 2025 GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction AAAI 2025 CaMML: Context-Aware Multimodal Learner for Large Models ACL 2024 Unified Lexical Representation for Interpretable Visual-Language Alignment NIPS 2024 DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion NIPS 2024 One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos NIPS 2024 RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation NIPS 2024 Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation NIPS 2024 Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning NIPS 2024 NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction NIPS 2024 EMR-Merging: Tuning-Free High-Performance Model Merging NIPS 2024 Frozen CLIP Transformer Is an Efficient Point Cloud Encoder AAAI 2024 Boosting Residual Networks with Group Knowledge AAAI 2024 Graph Machine Learning through the Lens of Bilevel Optimization AISTATS 2024 Adaptive Slot Attention: Object Discovery with Dynamic Slot Number CVPR 2024 TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation CVPR 2024 UniPAD: A Universal Pre-training Paradigm for Autonomous Driving CVPR 2024 DreamComposer: Controllable 3D Object Generation via Multi-View Conditions CVPR 2024 Point Transformer V3: Simpler Faster Stronger CVPR 2024 Learning for Transductive Threshold Calibration in Open-World Recognition CVPR 2024 GVGEN: Text-to-3D Generation with Volumetric Representation ECCV 2024 Agent3D-Zero: An Agent for Zero-shot 3D Understanding ECCV 2024 Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting ECCV 2024 DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM ECCV 2024 PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines ECCV 2024 Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model ICLR 2024 Consistent Video-to-Video Transfer Using Synthetic Dataset ICLR 2024 PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer CVPR 2023 Bridging the Gap to Real-World Object-Centric Learning ICLR 2023 GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds CVPR 2023 Ponder: Point Cloud Pre-training via Neural Rendering ICCV 2023 Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation ICCV 2023 Object-Centric Multiple Object Tracking ICCV 2023 Unsupervised Open-Vocabulary Object Localization in Videos ICCV 2023 Coarse-to-Fine Amodal Segmentation with Shape Prior ICCV 2023 Crossing the Gap: Domain Generalization for Image Captioning CVPR 2023 MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency CVPR 2023 Learning Manifold Dimensions with Conditional Variational Autoencoders NIPS 2022 Self-supervised Amodal Video Object Segmentation NIPS 2022 PSS: Progressive Sample Selection for Open-World Visual Representation Learning ECCV 2022 PointInst3D: Segmenting 3D Instances by Points ECCV 2022 GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction NIPS 2021 Progressive Coordinate Transforms for Monocular 3D Object Detection NIPS 2021 HCRF-Flow: Scene Flow From Point Clouds With Continuous High-Order CRFs and Position-Aware Flow Embedding CVPR 2021 Learning Hierarchical Graph Neural Networks for Image Clustering ICCV 2021 ARCH++: Animation-Ready Clothed Human Reconstruction Revisited ICCV 2021 DyCo3D: Robust Instance Segmentation of 3D Point Clouds Through Dynamic Convolution CVPR 2021 ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network CVPR 2020 Instance-Aware Embedding for Point Cloud Instance Segmentation ECCV 2020 SAM: Squeeze-and-Mimic Networks for Conditional Visual Driving Policy Learning CORL 2020 GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing JMLR 2020 Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction NIPS 2020 Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation ECCV 2020 FCOS: Fully Convolutional One-Stage Object Detection ICCV 2019 Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation CVPR 2019 GeoNet: Deep Geodesic Networks for Point Cloud Analysis CVPR 2019 Knowledge Adaptation for Efficient Semantic Segmentation CVPR 2019 Bag of Tricks for Image Classification with Convolutional Neural Networks CVPR 2019 Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors AAAI 2019 GIF2Video: Color Dequantization and Temporal Interpolation of GIF Images CVPR 2019 An End-to-End TextSpotter With Explicit Alignment and Attention CVPR 2018 Single Shot Text Detector With Regional Attention ICCV 2017