hongsheng Li
240 papers · 2014–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (14) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π£ Hot Topic Early Bird
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(14)
π
Conference Loyalist
(28)
π
Keyword Trendsetter Combo
(6)
π
Grand Slam
π
Triple Crown
π€
Dynamic Duo
(69)
π₯
Mega-Team
(22)
π±
Topic Pioneer
π¬
Deep Specialist
(42)
π§¬
Topic Evolution
π
Keyword Champion
(8)
ποΈ
Keyword Collector
(754)
β
The Questioner
(2)
π
Century Club
(233)
π
Conference Pioneer
π₯
Unstoppable
(12)
π
Trend Setter
β‘
Prolific Year
(43)
Conferences
CVPR (78)
ICCV (40)
ECCV (35)
NIPS (28)
ICLR (23)
AAAI (14)
ACL (9)
ICML (5)
CORL (3)
EMNLP (3)
MICCAI (1)
WACV (1)
Top co-authors
Keywords
point cloud
(18)
autonomous driving
(17)
3d object detection
(14)
convolutional neural network
(13)
depth estimation
(11)
semantic segmentation
(10)
large language model
(10)
multimodal learning
(9)
diffusion model
(9)
object detection
(8)
text-to-image generation
(8)
neural network
(8)
self-supervised learning
(8)
person re-identification
(8)
scene understanding
(7)
multimodal large language model
(7)
image generation
(7)
domain adaptation
(6)
video understanding
(6)
3d vision
(6)
Papers
From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench
AAAI 2026
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning
AAAI 2026
Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning
AAAI 2026
Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling
AAAI 2026
TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
AAAI 2026
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning
ACL 2026
Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning
ACL 2026
GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking
CVPR 2025
Let's Verify and Reinforce Image Generation Step by Step
CVPR 2025
FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering
CVPR 2025
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
CVPR 2025
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation
CVPR 2025
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
CVPR 2025
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
CVPR 2025
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
CVPR 2025
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
ICLR 2025
VBCD: A Voxel-Based Framework for Personalized Dental Crown Design
MICCAI 2025
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
ICML 2025
One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation
ICML 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025
CameraCtrl: Enabling Camera Control for Video Diffusion Models
ICLR 2025
Mixture Compressor for Mixture-of-Experts LLMs Gains More
ICLR 2025
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
ICLR 2025
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
ICLR 2025
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
ICLR 2025
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
ICLR 2025
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
AAAI 2025
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
AAAI 2025
GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
AAAI 2025
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
ICLR 2025
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
ICLR 2025
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
ICLR 2025
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation
ACL 2025
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
ACL 2025
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
ACL 2025
Probability-Consistent Preference Optimization for Enhanced LLM Reasoning
ACL 2025
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
ICLR 2025
Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception
ICLR 2025
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
ICCV 2025
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
HPSv3: Towards Wide-Spectrum Human Preference Score
ICCV 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
ICCV 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
ICCV 2025
ConsistentCity: Semantic Flow-guided Occupancy DiT for Temporally Consistent Driving Scene Synthesis
ICCV 2025
LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
EMNLP 2025
Alignment with Fill-In-the-Middle for Enhancing Code Generation
EMNLP 2025
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
EMNLP 2025
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
CVPR 2025
DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation
CVPR 2025
Personalize Segment Anything Model with One Shot
ICLR 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
NIPS 2024
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
NIPS 2024
Learning 1D Causal Visual Representation with De-focus Attention Networks
NIPS 2024
A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
NIPS 2024
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
NIPS 2024
Phased Consistency Models
NIPS 2024
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
NIPS 2024
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
NIPS 2024
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
NIPS 2024
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NIPS 2024
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
NIPS 2024
A3VLM: Actionable Articulation-Aware Vision Language Model
CORL 2024
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
ACL 2024
Empowering Character-level Text Infilling by Eliminating Sub-Tokens
ACL 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
ACL 2024
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
CVPR 2024
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
CVPR 2024
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
CVPR 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
CVPR 2024
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
CVPR 2024
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
ECCV 2024
nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
ECCV 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
ECCV 2024
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
ECCV 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
ECCV 2024
Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
ECCV 2024
Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
ECCV 2024
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
ECCV 2024
ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
ECCV 2024
Delving Deep into Engagement Prediction of Short Videos
ECCV 2024
"SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models"
ECCV 2024
Unmasking Bias in Diffusion Model Training
ECCV 2024
"BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events"
ECCV 2024
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
ECCV 2024
DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
ECCV 2024
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
ICLR 2024
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
ICLR 2024
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention
ICLR 2024
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
ICLR 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
ICML 2024
UE4-NeRF:Neural Radiance Field for Real-Time Rendering of Large-Scale Scene
NIPS 2023
Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders
CVPR 2023
CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching
CVPR 2023
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
CVPR 2023
PATS: Patch Area Transportation With Subdivision for Local Feature Matching
CVPR 2023
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
CVPR 2023
Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation
CVPR 2023
ConQueR: Query Contrast Voxel-DETR for 3D Object Detection
CVPR 2023
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels
CVPR 2023
ReasonNet: End-to-End Driving With Temporal and Global Reasoning
CVPR 2023
Starting From Non-Parametric Networks for 3D Point Cloud Analysis
CVPR 2023
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners
CVPR 2023
A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift
CVPR 2023
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
CVPR 2023
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
ICCV 2023
SparseMAE: Sparse Training Meets Masked Autoencoders
ICCV 2023
Simulating Fluids in Real-World Still Images
ICCV 2023
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
ICCV 2023
Urban Radiance Field Representation with Deformable Neural Mesh Primitives
ICCV 2023
VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
ICCV 2023
Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
ICCV 2023
Omnidirectional Information Gathering for Knowledge Transfer-Based Audio-Visual Navigation
ICCV 2023
NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space
ICCV 2023
TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses
ICCV 2023
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
ICCV 2023
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
ICCV 2023
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference
ICCV 2023
LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
NIPS 2023
JourneyDB: A Benchmark for Generative Image Understanding
NIPS 2023
A Unified Conditional Framework for Diffusion-based Image Restoration
NIPS 2023
Context-PIPs: Persistent Independent Particles Demands Spatial Context Features
NIPS 2023
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning
ICLR 2022
MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection
ECCV 2022
EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers
ECCV 2022
Towards Robust Face Recognition with Comprehensive Search
ECCV 2022
FlowFormer: A Transformer Architecture for Optical Flow
ECCV 2022
Learning Degradation Representations for Image Deblurring
ECCV 2022
"UniNet: Unified Architecture Search with Convolution, Transformer, and MLP"
ECCV 2022
TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
ECCV 2022
Frozen CLIP Models Are Efficient Video Learners
ECCV 2022
Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification
ECCV 2022
Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
CORL 2022
MCMAE: Masked Convolution Meets Masked Autoencoders
NIPS 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
NIPS 2022
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
NIPS 2022
Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields
NIPS 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
NIPS 2022
Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks
CVPR 2022
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
CVPR 2022
IDR: Self-Supervised Image Denoising via Iterative Data Refinement
CVPR 2022
RBGNet: Ray-Based Grouping for 3D Object Detection
CVPR 2022
RNNPose: Recurrent 6-DoF Object Pose Refinement With Robust Correspondence Field Estimation and Pose Optimization
CVPR 2022
AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks
CVPR 2022
Learning a Structured Latent Space for Unsupervised Point Cloud Completion
CVPR 2022
PointCLIP: Point Cloud Understanding by CLIP
CVPR 2022
Container: Context Aggregation Networks
NIPS 2021
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
CVPR 2021
Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization
CVPR 2021
Inverting Generative Adversarial Renderer for Face Reconstruction
CVPR 2021
ST3D: Self-Training for Unsupervised Domain Adaptation on 3D Object Detection
CVPR 2021
LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network
CVPR 2021
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
CVPR 2021
Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification
CVPR 2021
Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency
ICCV 2021
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
ICCV 2021
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
ICCV 2021
Progressive Correspondence Pruning by Consensus Learning
ICCV 2021
Rethinking Noise Synthesis and Modeling in Raw Denoising
ICCV 2021
Fast Convergence of DETR With Spatially Modulated Co-Attention
ICCV 2021
Encoder-Decoder With Multi-Level Attention for 3D Human Shape and Pose Estimation
ICCV 2021
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-Based 3D Detector
ICCV 2021
Semantic Scene Completion via Integrating Instances and Scene In-the-Loop
CVPR 2021
VS-Net: Voting With Segmentation for Visual Localization
CVPR 2021
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
AAAI 2021
REFINE: Prediction Fusion Network for Panoptic Segmentation
AAAI 2021
Efficient Attention: Attention With Linear Complexities
WACV 2021
A Unified Multi-Scenario Attacking Network for Visual Object Tracking
AAAI 2021
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch
ICLR 2021
DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks
NIPS 2021
Learning to Predict Context-adaptive Convolution for Semantic Segmentation
ECCV 2020
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation
ECCV 2020
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions
ECCV 2020
Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification
ICLR 2020
Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation
AAAI 2020
EfficientFCN: Holistically-guided Decoding for Semantic Segmentation
ECCV 2020
Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
NIPS 2020
PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
CVPR 2020
Self-supervising Fine-grained Region Similarities for Large-scale Image Localization
ECCV 2020
Balanced Meta-Softmax for Long-Tailed Visual Recognition
NIPS 2020
3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior
CVPR 2020
StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching
CVPR 2020
Robust Superpixel-Guided Attentional Adversarial Attack
CVPR 2020
RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax
ECCV 2020
SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks
CORL 2020
Group-Wise Correlation Stereo Network
CVPR 2019
A2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes
AAAI 2019
Unsupervised Cross-Spectral Stereo Matching by Learning to Synthesize
AAAI 2019
AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations
CVPR 2019
Conditional Adversarial Generative Flow for Controllable Image Synthesis
CVPR 2019
P2SGrad: Refined Gradients for Optimizing Deep Face Models
CVPR 2019
Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
NIPS 2019
PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud
CVPR 2019
Interpolated Convolutional Networks for 3D Point Cloud Understanding
ICCV 2019
Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints
ICCV 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
ICCV 2019
Multi-Modality Latent Interaction Network for Visual Question Answering
ICCV 2019
Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer
ICCV 2019
Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing
CVPR 2019
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
CVPR 2019
3D Human Pose Estimation in the Wild by Adversarial Learning
CVPR 2018
Single View Stereo Matching
CVPR 2018
Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding
CVPR 2018
Deep Group-Shuffling Random Walk for Person Re-Identification
CVPR 2018
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
NIPS 2018
Eliminating Background-Bias for Robust Person Re-Identification
CVPR 2018
End-to-End Deep Kronecker-Product Matching for Person Re-Identification
CVPR 2018
Group Consistent Similarity Learning via Deep CRF for Person Re-Identification
CVPR 2018
Person Re-identification with Deep Similarity-Guided Graph Neural Network
ECCV 2018
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
ECCV 2018
Learning Monocular Depth by Distilling Cross-domain Stereo Networks
ECCV 2018
Question-Guided Hybrid Convolution for Visual Question Answering
ECCV 2018
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
ECCV 2018
Person Search With Natural Language Description
CVPR 2017
Identity-Aware Textual-Visual Matching With Latent Co-Attention
ICCV 2017
Learning Feature Pyramids for Human Pose Estimation
ICCV 2017
Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification
ICCV 2017
Object Detection in Videos With Tubelet Proposal Networks
CVPR 2017
Learning Spatial Regularization With Image-Level Supervisions for Multi-Label Image Classification
CVPR 2017
StackGAN: Text to Photo-Realistic Image Synthesis With Stacked Generative Adversarial Networks
ICCV 2017
Online Multi-Object Tracking Using CNN-Based Single Object Tracker With Spatial-Temporal Attention Mechanism
ICCV 2017
Learning Deep Neural Networks for Vehicle Re-ID With Visual-Spatio-Temporal Path Proposals
ICCV 2017
CRF-CNN: Modeling Structured Information in Human Pose Estimation
NIPS 2016
Learning Deep Feature Representations With Domain Guided Dropout for Person Re-Identification
CVPR 2016
End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation
CVPR 2016
Structured Feature Learning for Pose Estimation
CVPR 2016
Object Detection From Video Tubelets With Convolutional Neural Networks
CVPR 2016
DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
CVPR 2015
Pedestrian Travel Time Estimation in Crowded Scenes
ICCV 2015
Cross-Scene Crowd Counting via Deep Convolutional Neural Networks
CVPR 2015
Understanding Pedestrian Behaviors From Stationary Crowd Groups
CVPR 2015
Saliency Detection by Multi-Context Deep Learning
CVPR 2015
Preconditioning for Accelerated Iteratively Reweighted Least Squares in Structured Sparsity Reconstruction
CVPR 2014