conftrace_

hongsheng Li

240 papers · 2014–2026 · 12 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+20 more ↓

🗺️ Taxonomy Completionist (14) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (14) 🏠 Conference Loyalist (28) 🌟 Keyword Trendsetter Combo (6) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (69) 👥 Mega-Team (22) 🌱 Topic Pioneer 🔬 Deep Specialist (42) 🧬 Topic Evolution 🏆 Keyword Champion (8) 🗃️ Keyword Collector (754) ❓ The Questioner (2) 💎 Century Club (233) 🚀 Conference Pioneer 🔥 Unstoppable (12) 📈 Trend Setter ⚡ Prolific Year (43)

Conferences

CVPR (78) ICCV (40) ECCV (35) NIPS (28) ICLR (23) AAAI (14) ACL (9) ICML (5) CORL (3) EMNLP (3) MICCAI (1) WACV (1)

Top co-authors

Xiaogang Wang (69) peng gao (38) Yu Qiao (31) Yu Liu (29) Renrui Zhang (27) Aojun Zhou (24) Jifeng Dai (22) Guanglu Song (18) Zhaoyang Huang (17) Shuai Yi (16)

Keywords

point cloud (18) autonomous driving (17) 3d object detection (14) convolutional neural network (13) depth estimation (11) semantic segmentation (10) large language model (10) multimodal learning (9) diffusion model (9) object detection (8) text-to-image generation (8) neural network (8) self-supervised learning (8) person re-identification (8) scene understanding (7) multimodal large language model (7) image generation (7) domain adaptation (6) video understanding (6) 3d vision (6)

Papers

From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench AAAI 2026 UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning AAAI 2026 Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning AAAI 2026 Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling AAAI 2026 TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation AAAI 2026 MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning ACL 2026 Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning ACL 2026 GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking CVPR 2025 Let's Verify and Reinforce Image Generation Step by Step CVPR 2025 FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering CVPR 2025 FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes CVPR 2025 Docopilot: Improving Multimodal Models for Document-Level Understanding CVPR 2025 OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation CVPR 2025 SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving CVPR 2025 BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices CVPR 2025 Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding CVPR 2025 MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines ICLR 2025 VBCD: A Voxel-Based Framework for Personalized Dental Crown Design MICCAI 2025 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM ICML 2025 One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation ICML 2025 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency ICML 2025 CameraCtrl: Enabling Camera Control for Video Diffusion Models ICLR 2025 Mixture Compressor for Mixture-of-Experts LLMs Gains More ICLR 2025 Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology ICLR 2025 LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation ICLR 2025 Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want ICLR 2025 Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures ICLR 2025 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions ICLR 2025 SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction ICLR 2025 Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation ICLR 2025 M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving AAAI 2025 LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding AAAI 2025 GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance AAAI 2025 Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models ICLR 2025 Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow ICLR 2025 MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code ICLR 2025 ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation ACL 2025 AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents ACL 2025 MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning ACL 2025 Probability-Consistent Preference Optimization for Enhanced LLM Reasoning ACL 2025 MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine ICLR 2025 Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception ICLR 2025 CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models ICCV 2025 PUMA: Empowering Unified MLLM with Multi-granular Visual Generation ICCV 2025 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework ICCV 2025 HPSv3: Towards Wide-Spectrum Human Preference Score ICCV 2025 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning ICCV 2025 GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices ICCV 2025 ConsistentCity: Semantic Flow-guided Occupancy DiT for Temporally Consistent Driving Scene Synthesis ICCV 2025 LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding EMNLP 2025 Alignment with Fill-In-the-Middle for Enhancing Code Generation EMNLP 2025 SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant? EMNLP 2025 SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding CVPR 2025 DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation CVPR 2025 Personalize Segment Anything Model with One Shot ICLR 2024 Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning NIPS 2024 Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control NIPS 2024 Learning 1D Causal Visual Representation with De-focus Attention Networks NIPS 2024 A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding NIPS 2024 CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching NIPS 2024 Phased Consistency Models NIPS 2024 Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset NIPS 2024 MoVA: Adapting Mixture of Vision Experts to Multimodal Context NIPS 2024 Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models NIPS 2024 Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT NIPS 2024 ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving NIPS 2024 A3VLM: Actionable Articulation-Aware Vision Language Model CORL 2024 MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs ACL 2024 Empowering Character-level Text Infilling by Eliminating Sub-Tokens ACL 2024 Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models ACL 2024 DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation CVPR 2024 SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction CVPR 2024 GLID: Pre-training a Generalist Encoder-Decoder Vision Model CVPR 2024 Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications CVPR 2024 Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft CVPR 2024 LMDrive: Closed-Loop End-to-End Driving with Large Language Models CVPR 2024 Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos ECCV 2024 nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding ECCV 2024 MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? ECCV 2024 FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis ECCV 2024 GiT: Towards Generalist Vision Transformer through Universal Language Interface ECCV 2024 Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding ECCV 2024 Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks ECCV 2024 Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation ECCV 2024 ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model ECCV 2024 Delving Deep into Engagement Prediction of Short Videos ECCV 2024 "SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models" ECCV 2024 Unmasking Bias in Diffusion Model Training ECCV 2024 "BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events" ECCV 2024 Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models ECCV 2024 DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition ECCV 2024 MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning ICLR 2024 Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification ICLR 2024 LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention ICLR 2024 ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process ICLR 2024 SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models ICML 2024 SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models ICML 2024 UE4-NeRF:Neural Radiance Field for Real-Time Rendering of Large-Scale Scene NIPS 2023 Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders CVPR 2023 CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching CVPR 2023 FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation CVPR 2023 PATS: Patch Area Transportation With Subdivision for Local Feature Matching CVPR 2023 MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers CVPR 2023 Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation CVPR 2023 ConQueR: Query Contrast Voxel-DETR for 3D Object Detection CVPR 2023 InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions CVPR 2023 Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels CVPR 2023 ReasonNet: End-to-End Driving With Temporal and Global Reasoning CVPR 2023 Starting From Non-Parametric Networks for 3D Point Cloud Analysis CVPR 2023 Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners CVPR 2023 A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift CVPR 2023 Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks CVPR 2023 Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction ICCV 2023 SparseMAE: Sparse Training Meets Masked Autoencoders ICCV 2023 Simulating Fluids in Real-World Still Images ICCV 2023 GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding ICCV 2023 Urban Radiance Field Representation with Deformable Neural Mesh Primitives ICCV 2023 VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation ICCV 2023 Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection ICCV 2023 Omnidirectional Information Gathering for Knowledge Transfer-Based Audio-Visual Navigation ICCV 2023 NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space ICCV 2023 TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses ICCV 2023 MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection ICCV 2023 DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds ICCV 2023 Human Preference Score: Better Aligning Text-to-Image Models with Human Preference ICCV 2023 LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios NIPS 2023 JourneyDB: A Benchmark for Generative Image Understanding NIPS 2023 A Unified Conditional Framework for Diffusion-based Image Restoration NIPS 2023 Context-PIPs: Persistent Independent Particles Demands Spatial Context Features NIPS 2023 UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning ICLR 2022 MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection ECCV 2022 EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers ECCV 2022 Towards Robust Face Recognition with Comprehensive Search ECCV 2022 FlowFormer: A Transformer Architecture for Optical Flow ECCV 2022 Learning Degradation Representations for Image Deblurring ECCV 2022 "UniNet: Unified Architecture Search with Convolution, Transformer, and MLP" ECCV 2022 TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers ECCV 2022 Frozen CLIP Models Are Efficient Video Learners ECCV 2022 Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification ECCV 2022 Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer CORL 2022 MCMAE: Masked Convolution Meets Masked Autoencoders NIPS 2022 Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training NIPS 2022 ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning NIPS 2022 Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields NIPS 2022 Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs NIPS 2022 Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks CVPR 2022 Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation CVPR 2022 IDR: Self-Supervised Image Denoising via Iterative Data Refinement CVPR 2022 RBGNet: Ray-Based Grouping for 3D Object Detection CVPR 2022 RNNPose: Recurrent 6-DoF Object Pose Refinement With Robust Correspondence Field Estimation and Pose Optimization CVPR 2022 AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks CVPR 2022 Learning a Structured Latent Space for Unsupervised Point Cloud Completion CVPR 2022 PointCLIP: Point Cloud Understanding by CLIP CVPR 2022 Container: Context Aggregation Networks NIPS 2021 DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network CVPR 2021 Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization CVPR 2021 Inverting Generative Adversarial Renderer for Face Reconstruction CVPR 2021 ST3D: Self-Training for Unsupervised Domain Adaptation on 3D Object Detection CVPR 2021 LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network CVPR 2021 Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation CVPR 2021 Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification CVPR 2021 Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency ICCV 2021 FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting ICCV 2021 Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization ICCV 2021 Progressive Correspondence Pruning by Consensus Learning ICCV 2021 Rethinking Noise Synthesis and Modeling in Raw Denoising ICCV 2021 Fast Convergence of DETR With Spatially Modulated Co-Attention ICCV 2021 Encoder-Decoder With Multi-Level Attention for 3D Human Shape and Pose Estimation ICCV 2021 LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-Based 3D Detector ICCV 2021 Semantic Scene Completion via Integrating Instances and Scene In-the-Loop CVPR 2021 VS-Net: Voting With Segmentation for Visual Localization CVPR 2021 Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers AAAI 2021 REFINE: Prediction Fusion Network for Panoptic Segmentation AAAI 2021 Efficient Attention: Attention With Linear Complexities WACV 2021 A Unified Multi-Scenario Attacking Network for Visual Object Tracking AAAI 2021 Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch ICLR 2021 DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks NIPS 2021 Learning to Predict Context-adaptive Convolution for Semantic Segmentation ECCV 2020 Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation ECCV 2020 Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions ECCV 2020 Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification ICLR 2020 Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation AAAI 2020 EfficientFCN: Holistically-guided Decoding for Semantic Segmentation ECCV 2020 Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID NIPS 2020 PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection CVPR 2020 Self-supervising Fine-grained Region Similarities for Large-scale Image Localization ECCV 2020 Balanced Meta-Softmax for Long-Tailed Visual Recognition NIPS 2020 3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior CVPR 2020 StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching CVPR 2020 Robust Superpixel-Guided Attentional Adversarial Attack CVPR 2020 RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax ECCV 2020 SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks CORL 2020 Group-Wise Correlation Stereo Network CVPR 2019 A2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes AAAI 2019 Unsupervised Cross-Spectral Stereo Matching by Learning to Synthesize AAAI 2019 AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations CVPR 2019 Conditional Adversarial Generative Flow for Controllable Image Synthesis CVPR 2019 P2SGrad: Refined Gradients for Optimizing Deep Face Models CVPR 2019 Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis NIPS 2019 PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud CVPR 2019 Interpolated Convolutional Networks for 3D Point Cloud Understanding ICCV 2019 Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints ICCV 2019 CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval ICCV 2019 Multi-Modality Latent Interaction Network for Visual Question Answering ICCV 2019 Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer ICCV 2019 Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing CVPR 2019 Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering CVPR 2019 3D Human Pose Estimation in the Wild by Adversarial Learning CVPR 2018 Single View Stereo Matching CVPR 2018 Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding CVPR 2018 Deep Group-Shuffling Random Walk for Person Re-Identification CVPR 2018 FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification NIPS 2018 Eliminating Background-Bias for Robust Person Re-Identification CVPR 2018 End-to-End Deep Kronecker-Product Matching for Person Re-Identification CVPR 2018 Group Consistent Similarity Learning via Deep CRF for Person Re-Identification CVPR 2018 Person Re-identification with Deep Similarity-Guided Graph Neural Network ECCV 2018 Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association ECCV 2018 Learning Monocular Depth by Distilling Cross-domain Stereo Networks ECCV 2018 Question-Guided Hybrid Convolution for Visual Question Answering ECCV 2018 Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data ECCV 2018 Person Search With Natural Language Description CVPR 2017 Identity-Aware Textual-Visual Matching With Latent Co-Attention ICCV 2017 Learning Feature Pyramids for Human Pose Estimation ICCV 2017 Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification ICCV 2017 Object Detection in Videos With Tubelet Proposal Networks CVPR 2017 Learning Spatial Regularization With Image-Level Supervisions for Multi-Label Image Classification CVPR 2017 StackGAN: Text to Photo-Realistic Image Synthesis With Stacked Generative Adversarial Networks ICCV 2017 Online Multi-Object Tracking Using CNN-Based Single Object Tracker With Spatial-Temporal Attention Mechanism ICCV 2017 Learning Deep Neural Networks for Vehicle Re-ID With Visual-Spatio-Temporal Path Proposals ICCV 2017 CRF-CNN: Modeling Structured Information in Human Pose Estimation NIPS 2016 Learning Deep Feature Representations With Domain Guided Dropout for Person Re-Identification CVPR 2016 End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation CVPR 2016 Structured Feature Learning for Pose Estimation CVPR 2016 Object Detection From Video Tubelets With Convolutional Neural Networks CVPR 2016 DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection CVPR 2015 Pedestrian Travel Time Estimation in Crowded Scenes ICCV 2015 Cross-Scene Crowd Counting via Deep Convolutional Neural Networks CVPR 2015 Understanding Pedestrian Behaviors From Stationary Crowd Groups CVPR 2015 Saliency Detection by Multi-Context Deep Learning CVPR 2015 Preconditioning for Accelerated Iteratively Reweighted Least Squares in Structured Sparsity Reconstruction CVPR 2014