Antonio Torralba
156 papers · 2007–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (26) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (7) π£ Hot Topic Early Bird
π
Academic Marathon
(18)
π
Renaissance Researcher
(7)
π
Interdisciplinary Bridge
π
Conference Loyalist
(36)
π
Keyword Trendsetter Combo
(29)
π€
Dynamic Duo
(24)
π
Triple Crown
π§¬
Topic Evolution
π
Keyword Champion
(2)
π₯
Mega-Team
(85)
π
Grand Slam
π±
Topic Pioneer
π¬
Deep Specialist
(22)
π₯
Unstoppable
(15)
β
The Questioner
β‘
Prolific Year
(19)
π
Century Club
(155)
ποΈ
Keyword Collector
(72)
π
Trend Setter
π
Conference Pioneer
Conferences
CVPR (47)
NIPS (36)
ICCV (28)
ECCV (17)
ICLR (14)
ICML (7)
ACL (2)
RSS (2)
AAAI (1)
CORL (1)
INTERSPEECH (1)
Top co-authors
Research topics
Keywords
generative adversarial network
(15)
self-supervised learning
(14)
representation learning
(14)
multimodal learning
(11)
semantic segmentation
(9)
video understanding
(8)
scene understanding
(8)
image generation
(8)
3d reconstruction
(7)
generative model
(7)
convolutional neural network
(7)
neural network
(6)
3d vision
(6)
unsupervised learning
(6)
object detection
(6)
object localization
(5)
transfer learning
(5)
domain adaptation
(5)
scene representation
(4)
future prediction
(4)
Papers
VirtualEnv: A Platform for Embodied AI Research
AAAI 2026
SketchAgent: Language-Driven Sequential Sketch Generation
CVPR 2025
MultiModal Action Conditioned Video Simulation
ICCV 2025
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
CVPR 2025
Adaptive Length Image Tokenization via Recurrent Allocation
ICLR 2025
Separating Knowledge and Perception with Procedural Data
ICML 2025
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
ICLR 2025
A Vision Check-up for Language Models
CVPR 2024
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
CVPR 2024
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
ECCV 2024
Characterizing Model Robustness via Natural Input Gradients
ECCV 2024
L4GM: Large 4D Gaussian Reconstruction Model
NIPS 2024
MMToM-QA: Multimodal Theory of Mind Question Answering
ACL 2024
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
ECCV 2024
Learning to Jointly Understand Visual and Tactile Signals
ICLR 2024
A Multimodal Automated Interpretability Agent
ICML 2024
Improving Factuality and Reasoning in Language Models through Multiagent Debate
ICML 2024
Generalizing Dataset Distillation via Deep Generative Prior
CVPR 2023
Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects
NIPS 2023
FIND: A Function Description Benchmark for Evaluating Interpretability Methods
NIPS 2023
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
ICML 2023
BT^2: Backward-compatible Training with Basis Transformation
ICCV 2023
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
ICCV 2023
Open-vocabulary Panoptic Segmentation with Embedding Modulation
ICCV 2023
3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes
NIPS 2023
Detecting Everything in the Open World: Towards Universal Object Detection
CVPR 2023
NeuralField-LDM: Scene Generation With Hierarchical Latent Diffusion Models
CVPR 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos
CVPR 2023
Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
ICCV 2023
ConceptFusion: Open-set multimodal 3D mapping
RSS 2023
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation
ICLR 2023
Composing Ensembles of Pre-trained Models via Iterative Consensus
ICLR 2023
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations
CVPR 2022
Finding Fallen Objects via Asynchronous Audio-Visual Integration
CVPR 2022
Robust Contrastive Learning Against Noisy Views
CVPR 2022
Learning Program Representations for Food Images and Cooking Recipes
CVPR 2022
GAN-Supervised Dense Visual Alignment
CVPR 2022
Dataset Distillation by Matching Training Trajectories
CVPR 2022
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
CVPR 2022
Denoised MDPs: Learning World Models Better Than the World Itself
ICML 2022
Natural Language Descriptions of Deep Visual Features
ICLR 2022
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
ICLR 2022
Correcting Robot Plans with Natural Language Feedback
RSS 2022
Learning Neural Acoustic Fields
NIPS 2022
Procedural Image Programs for Representation Learning
NIPS 2022
ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment
NIPS 2022
Pre-Trained Language Models for Interactive Decision-Making
NIPS 2022
Skill Induction and Planning with Latent Language
ACL 2022
MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning
ECCV 2022
Compositional Visual Generation with Composable Diffusion Models
ECCV 2022
Totems: Physical Objects for Verifying Visual Integrity
ECCV 2022
Disentangling Visual and Written Concepts in CLIP
CVPR 2022
Virtual Correspondence: Humans as a Cue for Extreme-View Geometry
CVPR 2022
Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps
CVPR 2022
Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering
ICLR 2021
Learning to See by Looking at Noise
NIPS 2021
Measuring Generalization with Optimal Transport
NIPS 2021
EditGAN: High-Precision Semantic Image Editing
NIPS 2021
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
NIPS 2021
Learning to Compose Visual Relations
NIPS 2021
Editing a classifier by rewriting its prediction rules
NIPS 2021
3D Neural Scene Representations for Visuomotor Control
CORL 2021
DriveGAN: Towards a Controllable High-Quality Neural Simulation
CVPR 2021
Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals
CVPR 2021
DatasetGAN: Efficient Labeled Data Factory With Minimal Human Effort
CVPR 2021
Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
CVPR 2021
BARF: Bundle-Adjusting Neural Radiance Fields
ICCV 2021
Scaling Up Instance Annotation via Label Propagation
ICCV 2021
Toward a Visual Concept Vocabulary for GAN Latent Space
ICCV 2021
What You Can Learn by Staring at a Blank Wall
ICCV 2021
Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
ICCV 2021
Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration
ICLR 2021
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
INTERSPEECH 2021
Rewriting a Deep Generative Model
ECCV 2020
Learning to Simulate Dynamic Environments With GameGAN
CVPR 2020
Height and Uprightness Invariance for 3D Prediction From a Single View
CVPR 2020
Diverse Image Generation via Self-Conditioned GANs
CVPR 2020
Music Gesture for Visual Sound Separation
CVPR 2020
Visual Grounding of Learned Physical Models
ICML 2020
Estimating Generalization under Distribution Shifts via Domain-Invariant Representations
ICML 2020
Learning Compositional Koopman Operators for Model-Based Control
ICLR 2020
CLEVRER: Collision Events for Video Representation and Reasoning
ICLR 2020
Deep Audio Priors Emerge From Harmonic Convolutional Networks
ICLR 2020
Debiased Contrastive Learning
NIPS 2020
Causal Discovery in Physical Systems from Videos
NIPS 2020
Detecting Natural Disasters, Damage, and Incidents in the Wild
ECCV 2020
Foley Music: Learning to Generate Music from Videos
ECCV 2020
The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement
ECCV 2020
Deep Feedback Inverse Problem Solver
ECCV 2020
Learning Words by Drawing Images
CVPR 2019
The Sound of Motions
ICCV 2019
Seeing What a GAN Cannot Generate
ICCV 2019
Neural Turtle Graphics for Modeling City Road Layouts
ICCV 2019
Meta-Sim: Learning to Generate Synthetic Datasets
ICCV 2019
Gaze360: Physically Unconstrained Gaze Estimation in the Wild
ICCV 2019
Self-Supervised Moving Vehicle Tracking With Stereo Sound
ICCV 2019
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
ICCV 2019
Through-Wall Human Mesh Recovery Using Radio Signals
ICCV 2019
Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids
ICLR 2019
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
ICLR 2019
Connecting Touch and Vision via Cross-Modal Prediction
CVPR 2019
How to Make a Pizza: Learning a Compositional Layer-Based GAN Model
CVPR 2019
Synthesizing Environment-Aware Activities via Activity Sketches
CVPR 2019
Visual Object Networks: Image Generation with Disentangled 3D Representations
NIPS 2018
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
ECCV 2018
The Sound of Pixels
ECCV 2018
Single Image Intrinsic Decomposition without a Single Intrinsic Image
ECCV 2018
Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks
ECCV 2018
Temporal Relational Reasoning in Videos
ECCV 2018
Interpretable Basis Decomposition for Visual Explanation
ECCV 2018
VirtualHome: Simulating Household Activities via Programs
CVPR 2018
Learning to Act Properly: Predicting and Explaining Affordances From Images
CVPR 2018
Inferring Light Fields From Shadows
CVPR 2018
Through-Wall Human Pose Estimation Using Radio Signals
CVPR 2018
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
NIPS 2018
3D-Aware Scene Manipulation via Inverse Graphics
NIPS 2018
Following Gaze in Video
ICCV 2017
Open Vocabulary Scene Parsing
ICCV 2017
Turning Corners Into Cameras: Principles and Methods
ICCV 2017
Scene Parsing Through ADE20K Dataset
CVPR 2017
Generating the Future With Adversarial Transformers
CVPR 2017
Learning Cross-Modal Embeddings for Cooking Recipes and Food Images
CVPR 2017
Network Dissection: Quantifying Interpretability of Deep Visual Representations
CVPR 2017
Learning Deep Features for Discriminative Localization
CVPR 2016
Predicting Motivations of Actions by Leveraging Text
CVPR 2016
Unsupervised Learning of Spoken Language with Visual Context
NIPS 2016
MovieQA: Understanding Stories in Movies Through Question-Answering
CVPR 2016
SoundNet: Learning Sound Representations from Unlabeled Video
NIPS 2016
Generating Videos with Scene Dynamics
NIPS 2016
Visually Indicated Sounds
CVPR 2016
Eye Tracking for Everyone
CVPR 2016
Anticipating Visual Representations From Unlabeled Video
CVPR 2016
Learning Aligned Cross-Modal Representations From Weakly Aligned Data
CVPR 2016
Learning visual biases from human imagination
NIPS 2015
Skip-Thought Vectors
NIPS 2015
Understanding and Predicting Image Memorability at a Large Scale
ICCV 2015
Where are they looking?
NIPS 2015
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
ICCV 2015
Learning Deep Features for Scene Recognition using Places Database
NIPS 2014
Looking Beyond the Visible Scene
CVPR 2014
HOGgles: Visualizing Object Detection Features
ICCV 2013
Parsing IKEA Objects: Fine Pose Estimation
ICCV 2013
SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
ICCV 2013
Shape Anchors for Data-Driven Multi-view Reconstruction
ICCV 2013
Modifying the Memorability of Face Photographs
ICCV 2013
Memorability of Image Regions
NIPS 2012
Localizing 3D cuboids in single-view images
NIPS 2012
Understanding the Intrinsic Memorability of Images
NIPS 2011
Transfer Learning by Borrowing Examples for Multiclass Object Detection
NIPS 2011
Learning to Learn with Compound HD Models
NIPS 2011
Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
NIPS 2009
Semi-Supervised Learning in Gigantic Image Collections
NIPS 2009
Nonparametric Bayesian Texture Learning and Synthesis
NIPS 2009
Spectral Hashing
NIPS 2008
Object Recognition by Scene Alignment
NIPS 2007