Li Fei-fei

135 papers · 2009–2025 · 15 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (22) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (15) 🏠 Conference Loyalist (22) 🌟 Keyword Trendsetter Combo (22) 🤝 Dynamic Duo (27) 🌱 Topic Pioneer 🏆 Keyword Champion (2) 👥 Mega-Team (29) 🔬 Deep Specialist (16) 🧬 Topic Evolution 🚀 Conference Pioneer 🔥 Unstoppable (14) ❓ The Questioner (2) ⚡ Prolific Year (10) 💎 Century Club (135) 🗃️ Keyword Collector (54) 📈 Trend Setter

Conferences

CVPR (53) CORL (22) ICCV (19) ECCV (10) NIPS (8) ICML (7) RSS (5) ICLR (3) WACV (2) ACL (1) EMNLP (1) IJCAI (1) IJCNLP (1) MLHC (1) NAACL (1)

Top co-authors

Jiajun Wu (27) Silvio Savarese (19) Yuke Zhu (18) Juan Carlos Niebles (17) Chen Wang (14) Ruohan Zhang (13) Li-jia Li (12) De-An Huang (12) Justin Johnson (11) Ehsan Adeli (11)

Research topics

Models (1) Science (1) Differential Privacy (1) Privacy (1)

Keywords

video understanding (12) imitation learning (9) reinforcement learning (8) multimodal learning (7) action recognition (7) weakly supervised learning (6) object recognition (6) transfer learning (6) scene graph (6) recurrent neural network (6) activity recognition (5) object detection (5) visual question answering (5) robot manipulation (5) convolutional neural network (5) visual reasoning (5) active learning (4) trajectory prediction (4) robot learning (4) unsupervised learning (4)

Papers

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation ICCV 2025 The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion CVPR 2025 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces CVPR 2025 Re-thinking Temporal Search for Long-Form Video Understanding CVPR 2025 s1: Simple test-time scaling EMNLP 2025 BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities CORL 2025 WorldScore: A Unified Evaluation Benchmark for World Generation ICCV 2025 Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization ICCV 2025 D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement CORL 2024 TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction CORL 2024 Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making NIPS 2024 OccFusion: Rendering Occluded Humans with Generative Diffusion Priors NIPS 2024 HourVideo: 1-Hour Video-Language Understanding NIPS 2024 DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation RSS 2024 ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation CORL 2024 Photorealistic Video Generation with Diffusion Models ECCV 2024 Automated Creation of Digital Cousins for Robust Policy Learning CORL 2024 Differentially Private Video Activity Recognition WACV 2024 MindAgent: Emergent Gaming Interaction NAACL 2024 Chain of Code: Reasoning with a Language Model-Augmented Code Emulator ICML 2024 BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation CVPR 2024 ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image CVPR 2024 VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models CORL 2023 Dynamic-Resolution Model Learning for Object Pile Manipulation RSS 2023 NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities CORL 2023 Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation CORL 2023 Modeling Dynamic Environments with Scene Graph Memory ICML 2023 MaskViT: Masked Visual Pre-Training for Video Prediction ICLR 2023 Rendering Humans from Object-Occluded Monocular Videos ICCV 2023 VIMA: Robot Manipulation with Multimodal Prompts ICML 2023 The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects CVPR 2023 MimicPlay: Long-Horizon Imitation Learning by Watching Human Play CORL 2023 BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation CORL 2022 ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer CVPR 2022 Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning CVPR 2022 Revisiting the "Video" in Video-Language Understanding CVPR 2022 A Dual Representation Framework for Robot Learning with Human Guidance CORL 2022 See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation CORL 2022 A Study of Face Obfuscation in ImageNet ICML 2022 MetaMorph: Learning Universal Controllers with Transformers ICLR 2022 PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens ECCV 2022 SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies ICML 2021 Discovering Generalizable Skills via Automated Generation of Diverse Tasks RSS 2021 Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering ACL 2021 Metadata Normalization CVPR 2021 What Matters in Learning from Offline Human Demonstrations for Robot Manipulation CORL 2021 Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation CORL 2021 Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction CVPR 2021 Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration CORL 2021 BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments CORL 2021 ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations CORL 2021 iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks CORL 2021 Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks CORL 2021 Scalable Differential Privacy With Sparse Network Finetuning CVPR 2021 Representation Learning With Statistical Independence to Mitigate Bias WACV 2021 Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering IJCNLP 2021 Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs CVPR 2020 GTI: Learning to Generalize across Long-Horizon Tasks from Human Demonstrations RSS 2020 RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition ECCV 2020 Procedure Planning in Instructional Videos ECCV 2020 DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs IJCAI 2020 Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation CORL 2019 Situational Fusion of Visual Representation for Visual Navigation ICCV 2019 Scene Graph Prediction With Limited Labels ICCV 2019 Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration CVPR 2019 Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation CVPR 2019 Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks CVPR 2019 DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion CVPR 2019 Information Maximizing Visual Question Generation CVPR 2019 D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation CVPR 2019 Peeking Into the Future: Predicting Future Person Activities and Locations in Videos CVPR 2019 Eidetic 3D LSTM: A Model for Video Prediction and Beyond ICLR 2019 Composing Text and Image for Image Retrieval - an Empirical Odyssey CVPR 2019 Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go? ICML 2018 SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark CORL 2018 ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation CORL 2018 Image Generation From Scene Graphs CVPR 2018 Social GAN: Socially Acceptable Trajectories With Generative Adversarial Networks CVPR 2018 Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos CVPR 2018 Referring Relationships CVPR 2018 Iterative Visual Reasoning Beyond Convolutions CVPR 2018 What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets CVPR 2018 Thoracic Disease Identification and Localization With Limited Supervision CVPR 2018 Graph Distillation for Action Detection with Privileged Modalities ECCV 2018 Dynamic Task Prioritization for Multitask Learning ECCV 2018 Progressive Neural Architecture Search ECCV 2018 Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos ECCV 2018 HiDDeN: Hiding Data with Deep Networks ECCV 2018 Neural Graph Matching Networks for Fewshot 3D Action Recognition ECCV 2018 MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels ICML 2018 3D Point Cloud-Based Visual Prediction of ICU Mobility Care Activities MLHC 2018 Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision RSS 2018 Fine-Grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach ICCV 2017 Inferring and Executing Programs for Visual Reasoning ICCV 2017 Characterizing and Improving Stability in Neural Style Transfer ICCV 2017 Knowledge Acquisition for Visual Question Answering via Iterative Querying CVPR 2017 A Hierarchical Approach for Generating Descriptive Image Paragraphs CVPR 2017 Scene Graph Generation by Iterative Message Passing CVPR 2017 Learning to Learn From Noisy Web Videos CVPR 2017 Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos CVPR 2017 Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals CVPR 2017 CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning CVPR 2017 Unsupervised Learning of Long-Term Motion Dynamics for Videos CVPR 2017 Visual Semantic Planning Using Deep Successor Representations ICCV 2017 Dense-Captioning Events in Videos ICCV 2017 Visual7W: Grounded Question Answering in Images CVPR 2016 Detecting Events and Key Actors in Multi-Person Videos CVPR 2016 End-To-End Learning of Action Detection From Frame Glimpses in Videos CVPR 2016 Recurrent Attention Models for Depth-Based Person Identification CVPR 2016 Social LSTM: Human Trajectory Prediction in Crowded Spaces CVPR 2016 DenseCap: Fully Convolutional Localization Networks for Dense Captioning CVPR 2016 Love Thy Neighbors: Image Annotation by Exploiting Image Metadata ICCV 2015 Improving Image Classification With Location Context ICCV 2015 RGB-W: When Vision Meets Wireless ICCV 2015 Learning Temporal Embeddings for Complex Video Analysis ICCV 2015 Fine-Grained Recognition Without Part Annotations CVPR 2015 Image Retrieval Using Scene Graphs CVPR 2015 Deep Visual-Semantic Alignments for Generating Image Descriptions CVPR 2015 Best of Both Worlds: Human-Machine Collaboration for Object Annotation CVPR 2015 Learning Semantic Relationships for Better Action Retrieval in Images CVPR 2015 Large-scale Video Classification with Convolutional Neural Networks CVPR 2014 Co-localization in Real-World Images CVPR 2014 Socially-aware Large-scale Crowd Forecasting CVPR 2014 Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going? ICCV 2013 Fine-Grained Crowdsourcing for Fine-Grained Recognition CVPR 2013 Social Role Discovery in Human Events CVPR 2013 Video Event Understanding Using Natural Language Descriptions ICCV 2013 Discriminative Segment Annotation in Weakly Labeled Video CVPR 2013 Discovering Object Functionality ICCV 2013 Combining the Right Features for Complex Event Recognition ICCV 2013 Shifting Weights: Adapting Object Detectors from Image to Video NIPS 2012 Large Margin Learning of Upstream Scene Understanding Models NIPS 2010 Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification NIPS 2010 Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions NIPS 2009 Exploring Functional Connectivities of the Human Brain using Multivariate Information Analysis NIPS 2009