Antonio Torralba

156 papers · 2007–2026 · 11 conferences · across top CS/AI conferences

Achievements

+20 more ↓

🗺️ Taxonomy Completionist (26) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🐣 Hot Topic Early Bird

🏃 Academic Marathon (18) 🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (36) 🌟 Keyword Trendsetter Combo (29) 🤝 Dynamic Duo (24) 👑 Triple Crown 🧬 Topic Evolution 🏆 Keyword Champion (2) 👥 Mega-Team (85) 🏆 Grand Slam 🌱 Topic Pioneer 🔬 Deep Specialist (22) 🔥 Unstoppable (15) ❓ The Questioner ⚡ Prolific Year (19) 💎 Century Club (155) 🗃️ Keyword Collector (72) 📈 Trend Setter 🚀 Conference Pioneer

Conferences

CVPR (47) NIPS (36) ICCV (28) ECCV (17) ICLR (14) ICML (7) ACL (2) RSS (2) AAAI (1) CORL (1) INTERSPEECH (1)

Top co-authors

Sanja Fidler (24) Joshua B. Tenenbaum (17) Shuang Li (14) Chuang Gan (14) David Bau (13) Carl Vondrick (11) Jun-Yan Zhu (11) Yunzhu Li (11) Aude Oliva (10) Aditya Khosla (10)

Research topics

Synthesis (1)

Keywords

generative adversarial network (15) self-supervised learning (14) representation learning (14) multimodal learning (11) semantic segmentation (9) video understanding (8) scene understanding (8) image generation (8) 3d reconstruction (7) generative model (7) convolutional neural network (7) neural network (6) 3d vision (6) unsupervised learning (6) object detection (6) object localization (5) transfer learning (5) domain adaptation (5) scene representation (4) future prediction (4)

Papers

VirtualEnv: A Platform for Embodied AI Research AAAI 2026 SketchAgent: Language-Driven Sequential Sketch Generation CVPR 2025 MultiModal Action Conditioned Video Simulation ICCV 2025 Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation CVPR 2025 Adaptive Length Image Tokenization via Recurrent Allocation ICLR 2025 Separating Knowledge and Perception with Procedural Data ICML 2025 Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains ICLR 2025 A Vision Check-up for Language Models CVPR 2024 Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models CVPR 2024 Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models ECCV 2024 Characterizing Model Robustness via Natural Input Gradients ECCV 2024 L4GM: Large 4D Gaussian Reconstruction Model NIPS 2024 MMToM-QA: Multimodal Theory of Mind Question Answering ACL 2024 LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis ECCV 2024 Learning to Jointly Understand Visual and Tactile Signals ICLR 2024 A Multimodal Automated Interpretability Agent ICML 2024 Improving Factuality and Reasoning in Language Models through Multiagent Debate ICML 2024 Generalizing Dataset Distillation via Deep Generative Prior CVPR 2023 Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects NIPS 2023 FIND: A Function Description Benchmark for Evaluating Interpretability Methods NIPS 2023 Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning ICML 2023 BT^2: Backward-compatible Training with Basis Transformation ICCV 2023 DreamTeacher: Pretraining Image Backbones with Deep Generative Models ICCV 2023 Open-vocabulary Panoptic Segmentation with Embedding Modulation ICCV 2023 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes NIPS 2023 Detecting Everything in the Open World: Towards Universal Object Detection CVPR 2023 NeuralField-LDM: Scene Generation With Hierarchical Latent Diffusion Models CVPR 2023 Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos CVPR 2023 Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models ICCV 2023 ConceptFusion: Open-set multimodal 3D mapping RSS 2023 FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation ICLR 2023 Composing Ensembles of Pre-trained Models via Iterative Consensus ICLR 2023 Ego4D: Around the World in 3,000 Hours of Egocentric Video CVPR 2022 BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations CVPR 2022 Finding Fallen Objects via Asynchronous Audio-Visual Integration CVPR 2022 Robust Contrastive Learning Against Noisy Views CVPR 2022 Learning Program Representations for Food Images and Cooking Recipes CVPR 2022 GAN-Supervised Dense Visual Alignment CVPR 2022 Dataset Distillation by Matching Training Trajectories CVPR 2022 Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction CVPR 2022 Denoised MDPs: Learning World Models Better Than the World Itself ICML 2022 Natural Language Descriptions of Deep Visual Features ICLR 2022 ComPhy: Compositional Physical Reasoning of Objects and Events from Videos ICLR 2022 Correcting Robot Plans with Natural Language Feedback RSS 2022 Learning Neural Acoustic Fields NIPS 2022 Procedural Image Programs for Representation Learning NIPS 2022 ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment NIPS 2022 Pre-Trained Language Models for Interactive Decision-Making NIPS 2022 Skill Induction and Planning with Latent Language ACL 2022 MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning ECCV 2022 Compositional Visual Generation with Composable Diffusion Models ECCV 2022 Totems: Physical Objects for Verifying Visual Integrity ECCV 2022 Disentangling Visual and Written Concepts in CLIP CVPR 2022 Virtual Correspondence: Humans as a Cue for Extreme-View Geometry CVPR 2022 Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps CVPR 2022 Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering ICLR 2021 Learning to See by Looking at Noise NIPS 2021 Measuring Generalization with Optimal Transport NIPS 2021 EditGAN: High-Precision Semantic Image Editing NIPS 2021 PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning NIPS 2021 Learning to Compose Visual Relations NIPS 2021 Editing a classifier by rewriting its prediction rules NIPS 2021 3D Neural Scene Representations for Visuomotor Control CORL 2021 DriveGAN: Towards a Controllable High-Quality Neural Simulation CVPR 2021 Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals CVPR 2021 DatasetGAN: Efficient Labeled Data Factory With Minimal Human Effort CVPR 2021 Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization CVPR 2021 BARF: Bundle-Adjusting Neural Radiance Fields ICCV 2021 Scaling Up Instance Annotation via Label Propagation ICCV 2021 Toward a Visual Concept Vocabulary for GAN Latent Space ICCV 2021 What You Can Learn by Staring at a Blank Wall ICCV 2021 Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions ICCV 2021 Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration ICLR 2021 AVLnet: Learning Audio-Visual Language Representations from Instructional Videos INTERSPEECH 2021 Rewriting a Deep Generative Model ECCV 2020 Learning to Simulate Dynamic Environments With GameGAN CVPR 2020 Height and Uprightness Invariance for 3D Prediction From a Single View CVPR 2020 Diverse Image Generation via Self-Conditioned GANs CVPR 2020 Music Gesture for Visual Sound Separation CVPR 2020 Visual Grounding of Learned Physical Models ICML 2020 Estimating Generalization under Distribution Shifts via Domain-Invariant Representations ICML 2020 Learning Compositional Koopman Operators for Model-Based Control ICLR 2020 CLEVRER: Collision Events for Video Representation and Reasoning ICLR 2020 Deep Audio Priors Emerge From Harmonic Convolutional Networks ICLR 2020 Debiased Contrastive Learning NIPS 2020 Causal Discovery in Physical Systems from Videos NIPS 2020 Detecting Natural Disasters, Damage, and Incidents in the Wild ECCV 2020 Foley Music: Learning to Generate Music from Videos ECCV 2020 The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement ECCV 2020 Deep Feedback Inverse Problem Solver ECCV 2020 Learning Words by Drawing Images CVPR 2019 The Sound of Motions ICCV 2019 Seeing What a GAN Cannot Generate ICCV 2019 Neural Turtle Graphics for Modeling City Road Layouts ICCV 2019 Meta-Sim: Learning to Generate Synthetic Datasets ICCV 2019 Gaze360: Physically Unconstrained Gaze Estimation in the Wild ICCV 2019 Self-Supervised Moving Vehicle Tracking With Stereo Sound ICCV 2019 HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization ICCV 2019 Through-Wall Human Mesh Recovery Using Radio Signals ICCV 2019 Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids ICLR 2019 GAN Dissection: Visualizing and Understanding Generative Adversarial Networks ICLR 2019 Connecting Touch and Vision via Cross-Modal Prediction CVPR 2019 How to Make a Pizza: Learning a Compositional Layer-Based GAN Model CVPR 2019 Synthesizing Environment-Aware Activities via Activity Sketches CVPR 2019 Visual Object Networks: Image Generation with Disentangled 3D Representations NIPS 2018 Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input ECCV 2018 The Sound of Pixels ECCV 2018 Single Image Intrinsic Decomposition without a Single Intrinsic Image ECCV 2018 Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks ECCV 2018 Temporal Relational Reasoning in Videos ECCV 2018 Interpretable Basis Decomposition for Visual Explanation ECCV 2018 VirtualHome: Simulating Household Activities via Programs CVPR 2018 Learning to Act Properly: Predicting and Explaining Affordances From Images CVPR 2018 Inferring Light Fields From Shadows CVPR 2018 Through-Wall Human Pose Estimation Using Radio Signals CVPR 2018 Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding NIPS 2018 3D-Aware Scene Manipulation via Inverse Graphics NIPS 2018 Following Gaze in Video ICCV 2017 Open Vocabulary Scene Parsing ICCV 2017 Turning Corners Into Cameras: Principles and Methods ICCV 2017 Scene Parsing Through ADE20K Dataset CVPR 2017 Generating the Future With Adversarial Transformers CVPR 2017 Learning Cross-Modal Embeddings for Cooking Recipes and Food Images CVPR 2017 Network Dissection: Quantifying Interpretability of Deep Visual Representations CVPR 2017 Learning Deep Features for Discriminative Localization CVPR 2016 Predicting Motivations of Actions by Leveraging Text CVPR 2016 Unsupervised Learning of Spoken Language with Visual Context NIPS 2016 MovieQA: Understanding Stories in Movies Through Question-Answering CVPR 2016 SoundNet: Learning Sound Representations from Unlabeled Video NIPS 2016 Generating Videos with Scene Dynamics NIPS 2016 Visually Indicated Sounds CVPR 2016 Eye Tracking for Everyone CVPR 2016 Anticipating Visual Representations From Unlabeled Video CVPR 2016 Learning Aligned Cross-Modal Representations From Weakly Aligned Data CVPR 2016 Learning visual biases from human imagination NIPS 2015 Skip-Thought Vectors NIPS 2015 Understanding and Predicting Image Memorability at a Large Scale ICCV 2015 Where are they looking? NIPS 2015 Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books ICCV 2015 Learning Deep Features for Scene Recognition using Places Database NIPS 2014 Looking Beyond the Visible Scene CVPR 2014 HOGgles: Visualizing Object Detection Features ICCV 2013 Parsing IKEA Objects: Fine Pose Estimation ICCV 2013 SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels ICCV 2013 Shape Anchors for Data-Driven Multi-view Reconstruction ICCV 2013 Modifying the Memorability of Face Photographs ICCV 2013 Memorability of Image Regions NIPS 2012 Localizing 3D cuboids in single-view images NIPS 2012 Understanding the Intrinsic Memorability of Images NIPS 2011 Transfer Learning by Borrowing Examples for Multiclass Object Detection NIPS 2011 Learning to Learn with Compound HD Models NIPS 2011 Unsupervised Detection of Regions of Interest Using Iterative Link Analysis NIPS 2009 Semi-Supervised Learning in Gigantic Image Collections NIPS 2009 Nonparametric Bayesian Texture Learning and Synthesis NIPS 2009 Spectral Hashing NIPS 2008 Object Recognition by Scene Alignment NIPS 2007