Silvio Savarese

127 papers · 2012–2025 · 21 conferences · across top CS/AI conferences

Achievements

+20 more ↓

🗺️ Taxonomy Completionist (21) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (21)

🧭 Keyword Pioneer 🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🌟 Keyword Trendsetter Combo (8) 🏠 Conference Loyalist (38) 🌱 Topic Pioneer 🤝 Dynamic Duo (37) 🏆 Grand Slam 👥 Mega-Team (29) 👑 Triple Crown 🔬 Deep Specialist (17) 🧬 Topic Evolution 🏆 Keyword Champion (3) 💎 Century Club (127) 🗃️ Keyword Collector (532) ❓ The Questioner (2) 🚀 Conference Pioneer ⚡ Prolific Year (15) 🔥 Unstoppable (14) 📈 Trend Setter

Conferences

CVPR (38) CORL (15) ICCV (11) NIPS (11) EMNLP (8) ICLR (8) ICML (8) RSS (7) ACL (4) ECCV (3) NAACL (3) EACL (2) WACV (1) UAI (1) PGM (1) JMLR (1) IJCAI (1) CONLL (1) CLEAR (1) AISTATS (1) AAAI (1)

Top co-authors

Caiming Xiong (37) Huan Wang (23) Juan Carlos Niebles (20) Li Fei-fei (19) Shelby Heinecke (15) Roberto Martín-Martín (15) Yuke Zhu (13) Jianguo Zhang (12) Zhiwei Liu (12) Weiran Yao (12)

Research topics

Models (1) Robotics (1)

Keywords

reinforcement learning (10) convolutional neural network (8) large language model (8) object detection (8) imitation learning (7) scene understanding (7) multimodal learning (6) semantic segmentation (6) robot manipulation (6) video understanding (6) agent system (5) unsupervised learning (5) recurrent neural network (5) transfer learning (5) trajectory prediction (5) action recognition (5) generative adversarial network (4) visual question answering (4) representation learning (4) attention mechanism (4)

Papers

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments NAACL 2025 xLAM: A Family of Large Action Models to Empower AI Agent Systems NAACL 2025 LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback ACL 2025 PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data ACL 2025 Text2Data: Low-Resource Data Generation with Textual Control AAAI 2025 LATTE: Learning to Think with Vision Specialists EMNLP 2025 ActionStudio: A Lightweight Framework for Data and Training of Large Action Models EMNLP 2025 ViUniT: Visual Unit Tests for More Robust Visual Programming CVPR 2025 SlackAgents: Scalable Collaboration of AI Agents in Workspaces EMNLP 2025 Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D EMNLP 2025 Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents ICLR 2025 Reward-Guided Speculative Decoding for Efficient LLM Reasoning ICML 2025 MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models EMNLP 2025 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts ICML 2025 CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models NAACL 2025 PRACT: Optimizing Principled Reasoning and Acting of LLM Agent EMNLP 2024 ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding CVPR 2024 HIVE: Harnessing Human Feedback for Instructional Visual Editing CVPR 2024 DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI EACL 2024 "X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning" ECCV 2024 On the Unlikelihood of D-Separation PGM 2024 Causal Layering via Conditional Entropy CLEAR 2024 PRACT: Optimizing Principled Reasoning and Acting of LLM Agent CONLL 2024 Unified Training of Universal Time Series Forecasting Transformers ICML 2024 Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization ICLR 2024 How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations ICLR 2024 MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens NIPS 2024 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments NIPS 2024 APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets NIPS 2024 INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness NIPS 2024 CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis ICLR 2023 Masked Unsupervised Self-training for Label-free Image Classification ICLR 2023 ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding CVPR 2023 Procedure-Aware Pretraining for Instructional Video Understanding CVPR 2023 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models ICML 2023 Modeling Dynamic Environments with Scene Graph Memory ICML 2023 Long Document Summarization with Top-down and Bottom-up Inference EACL 2023 An Extensible Multi-modal Multi-task Object Dataset with Materials ICLR 2023 UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild NIPS 2023 LAVIS: A One-stop Library for Language-Vision Intelligence ACL 2023 Merlion: End-to-End Machine Learning for Time Series JMLR 2023 Best-k Search Algorithm for Neural Text Generation ACL 2023 JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection CVPR 2022 CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning NIPS 2022 Local calibration: metrics and recalibration UAI 2022 BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation CORL 2022 ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation RSS 2022 Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training EMNLP 2022 Discovering Generalizable Skills via Automated Generation of Diverse Tasks RSS 2021 Topological Planning With Transformers for Vision-and-Language Navigation CVPR 2021 BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments CORL 2021 Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration CORL 2021 Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation CORL 2021 Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation CORL 2021 What Matters in Learning from Offline Human Demonstrations for Robot Manipulation CORL 2021 iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks CORL 2021 Adaptive Procedural Task Generation for Hard-Exploration Problems ICLR 2021 TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild ICCV 2021 Goal-Aware Prediction: Learning to Model What Matters ICML 2020 Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation CORL 2020 GTI: Learning to Generalize across Long-Horizon Tasks from Human Demonstrations RSS 2020 Which Tasks Should Be Learned Together in Multi-task Learning? ICML 2020 Leveraging Pretrained Image Classifiers for Language-Based Segmentation WACV 2020 Generative Sparse Detection Networks for 3D Single-shot Object Detection ECCV 2020 DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion CVPR 2019 Regression Planning Networks NIPS 2019 Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks NIPS 2019 Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation CORL 2019 HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators CORL 2019 AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers CORL 2019 Learning to Navigate Using Mid-Level Visual Priors CORL 2019 TopNet: Structural Point Cloud Decoder CVPR 2019 Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks CVPR 2019 Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression CVPR 2019 SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints CVPR 2019 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks CVPR 2019 Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration CVPR 2019 Situational Fusion of Visual Representation for Visual Navigation ICCV 2019 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera ICCV 2019 Taskonomy: Disentangling Task Transfer Learning IJCAI 2019 A Behavioral Approach to Visual Navigation with Graph Localization Networks RSS 2019 Social GAN: Socially Acceptable Trajectories With Generative Adversarial Networks CVPR 2018 Active Learning for Convolutional Neural Networks: A Core-Set Approach ICLR 2018 Demo2Vec: Reasoning Object Affordances From Online Videos CVPR 2018 Gibson Env: Real-World Perception for Embodied Agents CVPR 2018 CAR-Net: Clairvoyant Attentive Recurrent Network ECCV 2018 Adversarial Feature Augmentation for Unsupervised Domain Adaptation CVPR 2018 Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation EMNLP 2018 ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation CORL 2018 Deep Learning Under Privileged Information Using Heteroscedastic Dropout CVPR 2018 Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision RSS 2018 Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View CVPR 2018 Taskonomy: Disentangling Task Transfer Learning CVPR 2018 SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark CORL 2018 Generalizing to Unseen Domains via Adversarial Data Augmentation NIPS 2018 Feedback Networks CVPR 2017 Tracking the Untrackable: Learning to Track Multiple Cues With Long-Term Dependencies ICCV 2017 Lattice Long Short-Term Memory for Human Action Recognition ICCV 2017 image2mass: Estimating the Mass of an Object from Its Image CORL 2017 Social Scene Understanding: End-To-End Multi-Person Action Localization and Collective Activity Recognition CVPR 2017 Deep View Morphing CVPR 2017 Learning Transferrable Representations for Unsupervised Domain Adaptation NIPS 2016 Structural-RNN: Deep Learning on Spatio-Temporal Graphs CVPR 2016 Deep Metric Learning via Lifted Structured Feature Embedding CVPR 2016 3D Semantic Parsing of Large-Scale Indoor Spaces CVPR 2016 Social LSTM: Human Trajectory Prediction in Crowded Spaces CVPR 2016 DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes CVPR 2016 A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues RSS 2016 Universal Correspondence Network NIPS 2016 Data-Driven 3D Voxel Patterns for Object Category Recognition CVPR 2015 A Coarse-to-Fine Model for 3D Pose Estimation and Sub-Category Recognition CVPR 2015 Learning to Track: Online Multi-Object Tracking by Decision Making ICCV 2015 Action Recognition by Hierarchical Mid-Level Action Elements ICCV 2015 Unsupervised Semantic Parsing of Video Collections ICCV 2015 Watch-n-Patch: Unsupervised Understanding of Actions and Relations CVPR 2015 Enriching Object Detection With 2D-3D Registration and Continuous Viewpoint Estimation CVPR 2015 Combining 3D Shape, Color, and Motion for Robust Anytime Tracking RSS 2014 Structured Recurrent Temporal Restricted Boltzmann Machines ICML 2014 Learning an Image-based Motion Context for Multiple People Tracking CVPR 2014 Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies ICCV 2013 Understanding Indoor Scenes Using 3D Geometric Phrases CVPR 2013 3D Scene Understanding by Voxel-CRF ICCV 2013 Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses ICCV 2013 Dense Object Reconstruction with Semantic Priors CVPR 2013 Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses CVPR 2013 Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines CVPR 2013 Efficient and Exact MAP-MRF Inference using Branch and Bound AISTATS 2012