Tong He
78 papers · 2017–2025 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π Conference Polyglot (11) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π Academic Marathon (8)
π
Academic Marathon
(8)
π
Cross-Pollinator
(12)
πΊοΈ
Taxonomy Completionist
(101)
π
Conference Loyalist
(22)
π€
Dynamic Duo
(22)
π₯
Mega-Team
(20)
π
Grand Slam
π¬
Deep Specialist
(13)
π
Conference Pioneer
ποΈ
Keyword Collector
(282)
π
Century Club
(78)
π
Trend Setter
π₯
Unstoppable
(9)
β
The Questioner
β‘
Prolific Year
(6)
Conferences
CVPR (22)
NIPS (13)
ICCV (12)
ICLR (11)
ECCV (9)
AAAI (4)
AISTATS (2)
ICML (2)
ACL (1)
CORL (1)
JMLR (1)
Top co-authors
Research topics
Keywords
point cloud
(10)
self-supervised learning
(9)
semantic segmentation
(7)
3d object detection
(7)
representation learning
(6)
autonomous driving
(6)
3d vision
(6)
object detection
(5)
convolutional neural network
(4)
transfer learning
(4)
knowledge distillation
(4)
depth estimation
(3)
graph neural network
(3)
novel view synthesis
(3)
slot attention
(3)
generative model
(3)
attention mechanism
(3)
diffusion model
(3)
point cloud processing
(3)
amodal segmentation
(3)
Papers
ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
ICLR 2025
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
ICLR 2025
Common Learning Constraints Alter Interpretations of Direct Preference Optimization
AISTATS 2025
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
ICLR 2025
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
ICLR 2025
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
ICCV 2025
Aether: Geometric-Aware Unified World Modeling
ICCV 2025
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
ICCV 2025
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
CVPR 2025
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
CVPR 2025
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
CVPR 2025
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
ICLR 2025
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
ICLR 2025
Depth Any Video with Scalable Synthetic Data
ICLR 2025
Sparse Autoencoders, Again?
ICML 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
ICML 2025
GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction
AAAI 2025
CaMML: Context-Aware Multimodal Learner for Large Models
ACL 2024
Unified Lexical Representation for Interpretable Visual-Language Alignment
NIPS 2024
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
NIPS 2024
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
NIPS 2024
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
NIPS 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
NIPS 2024
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
NIPS 2024
NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction
NIPS 2024
EMR-Merging: Tuning-Free High-Performance Model Merging
NIPS 2024
Frozen CLIP Transformer Is an Efficient Point Cloud Encoder
AAAI 2024
Boosting Residual Networks with Group Knowledge
AAAI 2024
Graph Machine Learning through the Lens of Bilevel Optimization
AISTATS 2024
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
CVPR 2024
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
CVPR 2024
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
CVPR 2024
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
CVPR 2024
Point Transformer V3: Simpler Faster Stronger
CVPR 2024
Learning for Transductive Threshold Calibration in Open-World Recognition
CVPR 2024
GVGEN: Text-to-3D Generation with Volumetric Representation
ECCV 2024
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
ECCV 2024
Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
ECCV 2024
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
ECCV 2024
PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
ECCV 2024
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
ICLR 2024
Consistent Video-to-Video Transfer Using Synthetic Dataset
ICLR 2024
PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer
CVPR 2023
Bridging the Gap to Real-World Object-Centric Learning
ICLR 2023
GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds
CVPR 2023
Ponder: Point Cloud Pre-training via Neural Rendering
ICCV 2023
Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
ICCV 2023
Object-Centric Multiple Object Tracking
ICCV 2023
Unsupervised Open-Vocabulary Object Localization in Videos
ICCV 2023
Coarse-to-Fine Amodal Segmentation with Shape Prior
ICCV 2023
Crossing the Gap: Domain Generalization for Image Captioning
CVPR 2023
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency
CVPR 2023
Learning Manifold Dimensions with Conditional Variational Autoencoders
NIPS 2022
Self-supervised Amodal Video Object Segmentation
NIPS 2022
PSS: Progressive Sample Selection for Open-World Visual Representation Learning
ECCV 2022
PointInst3D: Segmenting 3D Instances by Points
ECCV 2022
GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction
NIPS 2021
Progressive Coordinate Transforms for Monocular 3D Object Detection
NIPS 2021
HCRF-Flow: Scene Flow From Point Clouds With Continuous High-Order CRFs and Position-Aware Flow Embedding
CVPR 2021
Learning Hierarchical Graph Neural Networks for Image Clustering
ICCV 2021
ARCH++: Animation-Ready Clothed Human Reconstruction Revisited
ICCV 2021
DyCo3D: Robust Instance Segmentation of 3D Point Clouds Through Dynamic Convolution
CVPR 2021
ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network
CVPR 2020
Instance-Aware Embedding for Point Cloud Instance Segmentation
ECCV 2020
SAM: Squeeze-and-Mimic Networks for Conditional Visual Driving Policy Learning
CORL 2020
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing
JMLR 2020
Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction
NIPS 2020
Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation
ECCV 2020
FCOS: Fully Convolutional One-Stage Object Detection
ICCV 2019
Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation
CVPR 2019
GeoNet: Deep Geodesic Networks for Point Cloud Analysis
CVPR 2019
Knowledge Adaptation for Efficient Semantic Segmentation
CVPR 2019
Bag of Tricks for Image Classification with Convolutional Neural Networks
CVPR 2019
Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors
AAAI 2019
GIF2Video: Color Dequantization and Temporal Interpolation of GIF Images
CVPR 2019
An End-to-End TextSpotter With Explicit Alignment and Attention
CVPR 2018
Single Shot Text Detector With Regional Attention
ICCV 2017