Silvio Savarese
127 papers · 2012–2025 · 21 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (21) π§ Keyword Pioneer π Renaissance Researcher (8) π Interdisciplinary Bridge π Conference Polyglot (21)
π§
Keyword Pioneer
π
Renaissance Researcher
(8)
π
Interdisciplinary Bridge
π
Keyword Trendsetter Combo
(8)
π
Conference Loyalist
(38)
π±
Topic Pioneer
π€
Dynamic Duo
(37)
π
Grand Slam
π₯
Mega-Team
(29)
π
Triple Crown
π¬
Deep Specialist
(17)
π§¬
Topic Evolution
π
Keyword Champion
(3)
π
Century Club
(127)
ποΈ
Keyword Collector
(532)
β
The Questioner
(2)
π
Conference Pioneer
β‘
Prolific Year
(15)
π₯
Unstoppable
(14)
π
Trend Setter
Conferences
CVPR (38)
CORL (15)
ICCV (11)
NIPS (11)
EMNLP (8)
ICLR (8)
ICML (8)
RSS (7)
ACL (4)
ECCV (3)
NAACL (3)
EACL (2)
WACV (1)
UAI (1)
PGM (1)
JMLR (1)
IJCAI (1)
CONLL (1)
CLEAR (1)
AISTATS (1)
AAAI (1)
Top co-authors
Research topics
Keywords
reinforcement learning
(10)
convolutional neural network
(8)
large language model
(8)
object detection
(8)
imitation learning
(7)
scene understanding
(7)
multimodal learning
(6)
semantic segmentation
(6)
robot manipulation
(6)
video understanding
(6)
agent system
(5)
unsupervised learning
(5)
recurrent neural network
(5)
transfer learning
(5)
trajectory prediction
(5)
action recognition
(5)
generative adversarial network
(4)
visual question answering
(4)
representation learning
(4)
attention mechanism
(4)
Papers
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
NAACL 2025
xLAM: A Family of Large Action Models to Empower AI Agent Systems
NAACL 2025
LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback
ACL 2025
PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data
ACL 2025
Text2Data: Low-Resource Data Generation with Textual Control
AAAI 2025
LATTE: Learning to Think with Vision Specialists
EMNLP 2025
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models
EMNLP 2025
ViUniT: Visual Unit Tests for More Robust Visual Programming
CVPR 2025
SlackAgents: Scalable Collaboration of AI Agents in Workspaces
EMNLP 2025
Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D
EMNLP 2025
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
ICLR 2025
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
ICML 2025
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models
EMNLP 2025
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
ICML 2025
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
NAACL 2025
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
EMNLP 2024
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
CVPR 2024
HIVE: Harnessing Human Feedback for Instructional Visual Editing
CVPR 2024
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
EACL 2024
"X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning"
ECCV 2024
On the Unlikelihood of D-Separation
PGM 2024
Causal Layering via Conditional Entropy
CLEAR 2024
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
CONLL 2024
Unified Training of Universal Time Series Forecasting Transformers
ICML 2024
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
ICLR 2024
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
ICLR 2024
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
NIPS 2024
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
NIPS 2024
APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets
NIPS 2024
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
NIPS 2024
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
ICLR 2023
Masked Unsupervised Self-training for Label-free Image Classification
ICLR 2023
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
CVPR 2023
Procedure-Aware Pretraining for Instructional Video Understanding
CVPR 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ICML 2023
Modeling Dynamic Environments with Scene Graph Memory
ICML 2023
Long Document Summarization with Top-down and Bottom-up Inference
EACL 2023
An Extensible Multi-modal Multi-task Object Dataset with Materials
ICLR 2023
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
NIPS 2023
LAVIS: A One-stop Library for Language-Vision Intelligence
ACL 2023
Merlion: End-to-End Machine Learning for Time Series
JMLR 2023
Best-k Search Algorithm for Neural Text Generation
ACL 2023
JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection
CVPR 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
NIPS 2022
Local calibration: metrics and recalibration
UAI 2022
BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation
CORL 2022
ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation
RSS 2022
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
EMNLP 2022
Discovering Generalizable Skills via Automated Generation of Diverse Tasks
RSS 2021
Topological Planning With Transformers for Vision-and-Language Navigation
CVPR 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
CORL 2021
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration
CORL 2021
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
CORL 2021
Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation
CORL 2021
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
CORL 2021
iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks
CORL 2021
Adaptive Procedural Task Generation for Hard-Exploration Problems
ICLR 2021
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild
ICCV 2021
Goal-Aware Prediction: Learning to Model What Matters
ICML 2020
Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation
CORL 2020
GTI: Learning to Generalize across Long-Horizon Tasks from Human Demonstrations
RSS 2020
Which Tasks Should Be Learned Together in Multi-task Learning?
ICML 2020
Leveraging Pretrained Image Classifiers for Language-Based Segmentation
WACV 2020
Generative Sparse Detection Networks for 3D Single-shot Object Detection
ECCV 2020
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
CVPR 2019
Regression Planning Networks
NIPS 2019
Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks
NIPS 2019
Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation
CORL 2019
HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators
CORL 2019
AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers
CORL 2019
Learning to Navigate Using Mid-Level Visual Priors
CORL 2019
TopNet: Structural Point Cloud Decoder
CVPR 2019
Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
CVPR 2019
Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression
CVPR 2019
SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints
CVPR 2019
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
CVPR 2019
Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration
CVPR 2019
Situational Fusion of Visual Representation for Visual Navigation
ICCV 2019
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
ICCV 2019
Taskonomy: Disentangling Task Transfer Learning
IJCAI 2019
A Behavioral Approach to Visual Navigation with Graph Localization Networks
RSS 2019
Social GAN: Socially Acceptable Trajectories With Generative Adversarial Networks
CVPR 2018
Active Learning for Convolutional Neural Networks: A Core-Set Approach
ICLR 2018
Demo2Vec: Reasoning Object Affordances From Online Videos
CVPR 2018
Gibson Env: Real-World Perception for Embodied Agents
CVPR 2018
CAR-Net: Clairvoyant Attentive Recurrent Network
ECCV 2018
Adversarial Feature Augmentation for Unsupervised Domain Adaptation
CVPR 2018
Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation
EMNLP 2018
ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation
CORL 2018
Deep Learning Under Privileged Information Using Heteroscedastic Dropout
CVPR 2018
Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision
RSS 2018
Im2Pano3D: Extrapolating 360Β° Structure and Semantics Beyond the Field of View
CVPR 2018
Taskonomy: Disentangling Task Transfer Learning
CVPR 2018
SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark
CORL 2018
Generalizing to Unseen Domains via Adversarial Data Augmentation
NIPS 2018
Feedback Networks
CVPR 2017
Tracking the Untrackable: Learning to Track Multiple Cues With Long-Term Dependencies
ICCV 2017
Lattice Long Short-Term Memory for Human Action Recognition
ICCV 2017
image2mass: Estimating the Mass of an Object from Its Image
CORL 2017
Social Scene Understanding: End-To-End Multi-Person Action Localization and Collective Activity Recognition
CVPR 2017
Deep View Morphing
CVPR 2017
Learning Transferrable Representations for Unsupervised Domain Adaptation
NIPS 2016
Structural-RNN: Deep Learning on Spatio-Temporal Graphs
CVPR 2016
Deep Metric Learning via Lifted Structured Feature Embedding
CVPR 2016
3D Semantic Parsing of Large-Scale Indoor Spaces
CVPR 2016
Social LSTM: Human Trajectory Prediction in Crowded Spaces
CVPR 2016
DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes
CVPR 2016
A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues
RSS 2016
Universal Correspondence Network
NIPS 2016
Data-Driven 3D Voxel Patterns for Object Category Recognition
CVPR 2015
A Coarse-to-Fine Model for 3D Pose Estimation and Sub-Category Recognition
CVPR 2015
Learning to Track: Online Multi-Object Tracking by Decision Making
ICCV 2015
Action Recognition by Hierarchical Mid-Level Action Elements
ICCV 2015
Unsupervised Semantic Parsing of Video Collections
ICCV 2015
Watch-n-Patch: Unsupervised Understanding of Actions and Relations
CVPR 2015
Enriching Object Detection With 2D-3D Registration and Continuous Viewpoint Estimation
CVPR 2015
Combining 3D Shape, Color, and Motion for Robust Anytime Tracking
RSS 2014
Structured Recurrent Temporal Restricted Boltzmann Machines
ICML 2014
Learning an Image-based Motion Context for Multiple People Tracking
CVPR 2014
Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
ICCV 2013
Understanding Indoor Scenes Using 3D Geometric Phrases
CVPR 2013
3D Scene Understanding by Voxel-CRF
ICCV 2013
Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
ICCV 2013
Dense Object Reconstruction with Semantic Priors
CVPR 2013
Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
CVPR 2013
Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines
CVPR 2013
Efficient and Exact MAP-MRF Inference using Branch and Bound
AISTATS 2012