Yin Li
49 papers · 2013–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
๐ Interdisciplinary Bridge ๐ Renaissance Researcher (8) ๐ Academic Marathon (12) ๐ Conference Polyglot (10) ๐บ๏ธ Taxonomy Completionist (80)
๐บ๏ธ
Taxonomy Completionist
(80)
๐งญ
Keyword Pioneer
๐ฃ
Hot Topic Early Bird
๐
Grand Slam
๐
Keyword Champion
๐
Century Club
(47)
โก
Prolific Year
(5)
๐๏ธ
Keyword Collector
(212)
๐ฅ
Unstoppable
(6)
๐
Trend Setter
๐
Conference Pioneer
Conferences
CVPR (18)
ICCV (10)
ECCV (9)
AAAI (3)
ICLR (3)
NIPS (2)
ACL (1)
ICML (1)
IJCAI (1)
WACV (1)
Top co-authors
Keywords
object detection
(7)
contrastive learning
(5)
action recognition
(4)
adaptive inference
(3)
neural network
(3)
zero-shot learning
(3)
multi-modal learning
(3)
instance segmentation
(3)
weakly supervised learning
(3)
depth estimation
(3)
diffusion model
(3)
image generation
(2)
convolutional network
(2)
neural rendering
(2)
3d reconstruction
(2)
efficient computing
(2)
semantic segmentation
(2)
self-supervised learning
(2)
scene graph generation
(2)
visual recognition
(2)
Papers
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation
AAAI 2026
Visual Bridge: Universal Visual Perception Representations Generating
AAAI 2026
Learning to Inference Adaptively for Multimodal Large Language Models
ICCV 2025
Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs
ICCV 2025
Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
ICCV 2025
Recovering Parametric Scenes from Very Few Time-of-Flight Pixels
ICCV 2025
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
ICCV 2025
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
CVPR 2025
PAVE: Patching and Adapting Video Large Language Models
CVPR 2025
LETS Forecast: Learning Embedology for Time Series Forecasting
ICML 2025
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
CVPR 2024
Towards 3D Vision with Low-Cost Single-Photon Cameras
CVPR 2024
SnAG: Scalable and Accurate Video Grounding
CVPR 2024
"RICA^2: Rubric-Informed, Calibrated Assessment of Actions"
ECCV 2024
Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning
ICLR 2024
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
ICCV 2023
Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations
CVPR 2023
InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning
ICLR 2023
Learned Compressive Representations for Single-Photon 3D Imaging
ICCV 2023
Spike-Based Anytime Perception
WACV 2023
3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image
CVPR 2022
ActionFormer: Localizing Moments of Actions with Transformers
ECCV 2022
3D Scene Inference from Transient Histograms
ECCV 2022
Event Neural Networks
ECCV 2022
Egocentric Activity Recognition and Localization on a 3D Map
ECCV 2022
mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors
NIPS 2022
SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles
CVPR 2022
RegionCLIP: Region-Based Language-Image Pretraining
CVPR 2022
Learning To Generate Scene Graph From Natural Language Supervision
ICCV 2021
Nystrรถmformer: A Nystrรถm-based Algorithm for Approximating Self-Attention
AAAI 2021
Dual-Stream Multiple Instance Learning Network for Whole Slide Image Classification With Self-Supervised Contrastive Learning
CVPR 2021
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation
CVPR 2021
A Simple Baseline for Weakly-Supervised Scene Graph Generation
ICCV 2021
Comprehensive Image Captioning via Scene Graph Decomposition
ECCV 2020
Gradients as Features for Deep Representation Learning
ICLR 2020
Interpretable and Accurate Fine-grained Recognition via Region Grouping
CVPR 2020
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video
ECCV 2020
Sense-Aware Neural Models for Pun Location in Texts
ACL 2018
Beyond Grids: Learning Graph Representations for Visual Recognition
NIPS 2018
3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare
CVPR 2018
In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video
ECCV 2018
Compositional Learning for Human Object Interaction
ECCV 2018
Densely Cascaded Shadow Detection Network via Deeply Supervised Parallel Fusion
IJCAI 2018
Learning Deep Structure-Preserving Image-Text Embeddings
CVPR 2016
Unsupervised Learning of Edges
CVPR 2016
Delving Into Egocentric Actions
CVPR 2015
Gaze-Enabled Egocentric Video Summarization via Constrained Submodular Maximization
CVPR 2015
The Secrets of Salient Object Segmentation
CVPR 2014
Learning to Predict Gaze in Egocentric Video
ICCV 2013