Yin Li

49 papers · 2013–2026 · 10 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (8) 🏃 Academic Marathon (12) 🌍 Conference Polyglot (10) 🗺️ Taxonomy Completionist (80)

🗺️ Taxonomy Completionist (80) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏆 Grand Slam 🏆 Keyword Champion 💎 Century Club (47) ⚡ Prolific Year (5) 🗃️ Keyword Collector (212) 🔥 Unstoppable (6) 📈 Trend Setter 🚀 Conference Pioneer

Conferences

CVPR (18) ICCV (10) ECCV (9) AAAI (3) ICLR (3) NIPS (2) ACL (1) ICML (1) IJCAI (1) WACV (1)

Top co-authors

Fangzhou Mu (9) James M. Rehg (9) Mohit Gupta (8) Yiwu Zhong (7) Dongsheng Jiang (4) Liwei Wang (4) Miao Liu (3) Yingyu Liang (3) Matthew Dutson (3) Yiquan Li (3)

Keywords

object detection (7) contrastive learning (5) action recognition (4) adaptive inference (3) neural network (3) zero-shot learning (3) multi-modal learning (3) instance segmentation (3) weakly supervised learning (3) depth estimation (3) diffusion model (3) image generation (2) convolutional network (2) neural rendering (2) 3d reconstruction (2) efficient computing (2) semantic segmentation (2) self-supervised learning (2) scene graph generation (2) visual recognition (2)

Papers

SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation AAAI 2026 Visual Bridge: Universal Visual Perception Representations Generating AAAI 2026 Learning to Inference Adaptively for Multimodal Large Language Models ICCV 2025 Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs ICCV 2025 Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text ICCV 2025 Recovering Parametric Scenes from Very Few Time-of-Flight Pixels ICCV 2025 AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning ICCV 2025 AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation CVPR 2025 PAVE: Patching and Adapting Video Large Language Models CVPR 2025 LETS Forecast: Learning Embedology for Time Series Forecasting ICML 2025 FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition CVPR 2024 Towards 3D Vision with Low-Cost Single-Photon Cameras CVPR 2024 SnAG: Scalable and Accurate Video Grounding CVPR 2024 "RICA^2: Rubric-Informed, Calibrated Assessment of Actions" ECCV 2024 Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning ICLR 2024 Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers ICCV 2023 Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations CVPR 2023 InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning ICLR 2023 Learned Compressive Representations for Single-Photon 3D Imaging ICCV 2023 Spike-Based Anytime Perception WACV 2023 3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image CVPR 2022 ActionFormer: Localizing Moments of Actions with Transformers ECCV 2022 3D Scene Inference from Transient Histograms ECCV 2022 Event Neural Networks ECCV 2022 Egocentric Activity Recognition and Localization on a 3D Map ECCV 2022 mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors NIPS 2022 SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles CVPR 2022 RegionCLIP: Region-Based Language-Image Pretraining CVPR 2022 Learning To Generate Scene Graph From Natural Language Supervision ICCV 2021 Nyströmformer: A Nyström-based Algorithm for Approximating Self-Attention AAAI 2021 Dual-Stream Multiple Instance Learning Network for Whole Slide Image Classification With Self-Supervised Contrastive Learning CVPR 2021 Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation CVPR 2021 A Simple Baseline for Weakly-Supervised Scene Graph Generation ICCV 2021 Comprehensive Image Captioning via Scene Graph Decomposition ECCV 2020 Gradients as Features for Deep Representation Learning ICLR 2020 Interpretable and Accurate Fine-grained Recognition via Region Grouping CVPR 2020 Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video ECCV 2020 Sense-Aware Neural Models for Pun Location in Texts ACL 2018 Beyond Grids: Learning Graph Representations for Visual Recognition NIPS 2018 3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare CVPR 2018 In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video ECCV 2018 Compositional Learning for Human Object Interaction ECCV 2018 Densely Cascaded Shadow Detection Network via Deeply Supervised Parallel Fusion IJCAI 2018 Learning Deep Structure-Preserving Image-Text Embeddings CVPR 2016 Unsupervised Learning of Edges CVPR 2016 Delving Into Egocentric Actions CVPR 2015 Gaze-Enabled Egocentric Video Summarization via Constrained Submodular Maximization CVPR 2015 The Secrets of Salient Object Segmentation CVPR 2014 Learning to Predict Gaze in Egocentric Video ICCV 2013