Trevor Darrell

246 papers · 2006–2025 · 17 conferences · across top CS/AI conferences

Achievements

+20 more ↓

🗺️ Taxonomy Completionist (38) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🏃 Academic Marathon (19) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (35) 🌟 Keyword Trendsetter Combo (37) 🤝 Dynamic Duo (33) 👑 Triple Crown 🌱 Topic Pioneer 🏆 Keyword Champion 🏆 Grand Slam 👥 Mega-Team (24) 🔬 Deep Specialist (50) 🧬 Topic Evolution 🚀 Conference Pioneer 🔥 Unstoppable (20) ❓ The Questioner (9) 💎 Century Club (246) 🗃️ Keyword Collector (130) ⚡ Prolific Year (27) 📈 Trend Setter

Conferences

CVPR (66) NIPS (35) ICCV (32) ICLR (26) ICML (19) ECCV (19) EMNLP (12) WACV (8) NAACL (8) CORL (6) ACL (6) JMLR (3) AAAI (2) EACL (1) AISTATS (1) RSS (1) UAI (1)

Top co-authors

Anna Rohrbach (33) Kate Saenko (32) Marcus Rohrbach (19) Roei Herzig (18) Joseph E. Gonzalez (17) Fisher Yu (14) Judy Hoffman (13) Kurt Keutzer (13) Dan Klein (12) Xin Wang (12)

Research topics

Core AI (1)

Keywords

multimodal learning (26) object detection (21) zero-shot learning (18) transfer learning (17) vision-language model (17) convolutional neural network (17) few-shot learning (15) semantic segmentation (15) domain adaptation (15) representation learning (13) image classification (11) self-supervised learning (11) diffusion model (10) visual reasoning (9) image generation (9) visual question answering (9) semi-supervised learning (9) video understanding (8) object recognition (8) unsupervised learning (8)

Papers

SegLLM: Multi-round Reasoning Segmentation with Large Language Models ICLR 2025 VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models ICLR 2025 A Coefficient Makes SVRG Effective ICLR 2025 Enough Coin Flips Can Make LLMs Act Bayesian ACL 2025 Visual Imitation Enables Contextual Humanoid Control CORL 2025 Scaling Vision Pre-Training to 4K Resolution CVPR 2025 Visual Lexicon: Rich Image Features in Language Space CVPR 2025 Pose Priors from Language Models CVPR 2025 VisionArena: 230k Real World User-VLM Conversations with Preference Labels CVPR 2025 AutoPresent: Designing Structured Visuals from Scratch CVPR 2025 Discovering Divergent Representations between Text-to-Image Models ICCV 2025 The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio CORL 2025 Navigation World Models CVPR 2025 Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features ICCV 2025 St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World ICCV 2025 Describe Anything: Detailed Localized Image and Video Captioning ICCV 2025 Pre-training Auto-regressive Robotic Models with 4D Representations ICML 2025 Vision-Language Models Create Cross-Modal Task Representations ICML 2025 Do What? Teaching Vision-Language-Action Models to Reject the Impossible EMNLP 2025 Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint EMNLP 2025 Dual-Process Image Generation ICCV 2025 Video Action Differencing ICLR 2025 Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark ICLR 2025 Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) ICLR 2025 MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion ICLR 2025 Self-correcting LLM-controlled Diffusion Models CVPR 2024 Recursive Visual Programming ECCV 2024 Finding Visual Task Vectors ECCV 2024 TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering EMNLP 2024 Simple Token-Level Confidence Improves Caption Correctness WACV 2024 PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data WACV 2024 Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning NIPS 2024 ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs NIPS 2024 When does perceptual alignment benefit vision representations? NIPS 2024 Humanoid Locomotion as Next Token Prediction NIPS 2024 Segment Anything without Supervision NIPS 2024 xT: Nested Tokenization for Larger Context in Large Images ICML 2024 LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning CORL 2024 Hyperbolic Active Learning for Semantic Segmentation under Domain Shift ICML 2024 Aligning Large Multimodal Models with Factually Augmented RLHF ACL 2024 Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI ICML 2024 Stochastic positional embeddings improve masked image modeling ICML 2024 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game ICLR 2024 Initializing Models with Larger Ones ICLR 2024 LLM-grounded Video Diffusion Models ICLR 2024 Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models ICLR 2024 Shape-Guided Diffusion With Inside-Outside Attention WACV 2024 Multitask Vision-Language Prompt Tuning WACV 2024 Readout Guidance: Learning Control from Diffusion Features CVPR 2024 EgoPet: Egomotion and Interaction Data from an Animal's Perspective ECCV 2024 See Say and Segment: Teaching LMMs to Overcome False Premises CVPR 2024 PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor CVPR 2024 From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations CVPR 2024 Unsupervised Universal Image Segmentation CVPR 2024 Describing Differences in Image Sets with Natural Language CVPR 2024 InstanceDiffusion: Instance-level Control for Image Generation CVPR 2024 VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation CVPR 2024 Sequential Modeling Enables Scalable Learning for Large Vision Models CVPR 2024 Compositional Chain-of-Thought Prompting for Large Multimodal Models CVPR 2024 Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction NAACL 2024 ALOHa: A New Measure for Hallucination in Captioning Models NAACL 2024 Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding NAACL 2024 When Do We Not Need Larger Vision Models? ECCV 2024 Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation NIPS 2023 More Control for Free! Image Synthesis With Semantic Diffusion Guidance WACV 2023 Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion WACV 2023 Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence NIPS 2023 Robot Learning with Sensorimotor Pre-training CORL 2023 Top-Down Visual Attention From Analysis by Synthesis CVPR 2023 Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption CVPR 2023 From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation EMNLP 2023 Guiding Pretraining in Reinforcement Learning with Large Language Models ICML 2023 Dropout Reduces Underfitting ICML 2023 CLAIR: Evaluating Image Captions with Large Language Models EMNLP 2023 Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs EMNLP 2023 Large Language Models are Visual Reasoning Coordinators NIPS 2023 Modular Visual Question Answering via Code Generation ACL 2023 Using Language to Extend to Unseen Domains ICLR 2023 Hierarchical Open-vocabulary Universal Image Segmentation NIPS 2023 Scaling Vision-Language Models with Sparse Mixture of Experts EMNLP 2023 Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning ICCV 2023 Can Language Models Learn to Listen? ICCV 2023 Zero-Shot Reward Specification via Grounded Natural Language ICML 2022 G3: Geolocation via Guidebook Grounding EMNLP 2022 Anytime Dense Prediction with Confidence Adaptivity ICLR 2022 Differentiable Gradient Sampling for Learning Implicit 3D Scene Reconstructions from a Single Image ICLR 2022 Disentangled Action Recognition with Knowledge Bases NAACL 2022 Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation NAACL 2022 Strumming to the Beat: Audio-Conditioned Contrastive Video Textures WACV 2022 DETReg: Unsupervised Pretraining With Region Priors for Object Detection CVPR 2022 Contrastive Test-Time Adaptation CVPR 2022 A ConvNet for the 2020s CVPR 2022 Object-Region Video Transformers CVPR 2022 Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion CVPR 2022 On Guiding Visual Attention With Language Specification CVPR 2022 Self-Supervised Pretraining Improves Self-Supervised Pretraining WACV 2022 Learning to Detect Every Thing in an Open World ECCV 2022 TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency ECCV 2022 Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly ECCV 2022 Real-World Robot Learning with Masked Visual Pre-training CORL 2022 Un-mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning AAAI 2022 Exposing the Limits of Video-Text Models through Contrast Sets NAACL 2022 ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension ACL 2022 Voxel-informed Language Grounding ACL 2022 Studying Bias in GANs through the Lens of Race ECCV 2022 Visual Attention Emerges from Recurrent Sparse Reconstruction ICML 2022 Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens NIPS 2022 Visual Prompting via Image Inpainting NIPS 2022 K-LITE: Learning Transferable Visual Models with External Knowledge NIPS 2022 Early Convolutions Help Transformers See Better NIPS 2021 Rethinking Preventing Class-Collapsing in Metric Learning With Margin-Based Losses ICCV 2021 ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation AAAI 2021 Tent: Fully Test-Time Adaptation by Entropy Minimization ICLR 2021 Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation CVPR 2021 SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning CVPR 2021 Compositional Video Synthesis with Action Graphs ICML 2021 Predicting With Confidence on Unseen Distributions ICCV 2021 NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media EMNLP 2021 Modular Networks for Compositional Instruction Following NAACL 2021 CLIP-It! Language-Guided Video Summarization NIPS 2021 Teachable Reinforcement Learning via Advice Distillation NIPS 2021 Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics CVPR 2021 Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation CVPR 2021 Tune It the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density ICCV 2021 Region Similarity Representation Learning ICCV 2021 Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning ICCV 2021 Robust Object Detection via Instance-Level Temporal Cycle Confusion ICCV 2021 Discovering Non-monotonic Autoregressive Orderings with Variational Inference ICLR 2021 Temporal Action Detection With Multi-Level Supervision ICCV 2021 What Should Not Be Contrastive in Contrastive Learning ICLR 2021 Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting ICLR 2021 Quasi-Dense Similarity Learning for Multiple Object Tracking CVPR 2021 Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control ICLR 2021 Adversarial Continual Learning ECCV 2020 Fighting Copycat Agents in Behavioral Cloning from Observation Histories NIPS 2020 Auxiliary Task Reweighting for Minimum-data Learning NIPS 2020 BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning CVPR 2020 Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks CVPR 2020 Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules CVPR 2020 Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA CVPR 2020 Learning Saliency Propagation for Semi-Supervised Instance Segmentation CVPR 2020 Hierarchical Style-based Networks for Motion Synthesis ECCV 2020 Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation ECCV 2020 Identity-Aware Multi-Sentence Video Description ECCV 2020 Learning Canonical Representations for Scene Graph to Image Generation ECCV 2020 Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning ECCV 2020 Uncertainty-guided Continual Learning with Bayesian Neural Networks ICLR 2020 Frustratingly Simple Few-Shot Object Detection ICML 2020 Video Prediction via Example Guidance ICML 2020 Discriminator Rejection Sampling ICLR 2019 Large-Scale Study of Curiosity-Driven Learning ICLR 2019 Compositional Plan Vectors NIPS 2019 Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity NIPS 2019 Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation ACL 2019 Robust Change Captioning ICCV 2019 Joint Monocular 3D Vehicle Detection and Tracking ICCV 2019 Variational Adversarial Active Learning ICCV 2019 Semi-Supervised Domain Adaptation via Minimax Entropy ICCV 2019 Few-Shot Object Detection via Feature Reweighting ICCV 2019 Disentangling Propagation and Generation for Video Prediction ICCV 2019 Language-Conditioned Graph Networks for Relational Reasoning ICCV 2019 Deep Mixture of Experts via Shallow Embedding UAI 2019 Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders CVPR 2019 Adversarial Inference for Multi-Sentence Video Description CVPR 2019 Hierarchical Discrete Distribution Decomposition for Match Density Estimation CVPR 2019 TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning CVPR 2019 Rethinking the Value of Network Pruning ICLR 2019 Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees ICLR 2019 Fooling Vision and Language Models Despite Localization and Attention Mechanism CVPR 2018 Learning to Segment Every Thing CVPR 2018 Explainable Neural Computation via Stack Neural Module Networks ECCV 2018 SkipNet: Learning Dynamic Routing in Convolutional Networks ECCV 2018 Localizing Moments in Video with Temporal Language EMNLP 2018 Speaker-Follower Models for Vision-and-Language Navigation NIPS 2018 CyCADA: Cycle-Consistent Adversarial Domain Adaptation ICML 2018 Textual Explanations for Self-Driving Vehicles ECCV 2018 Women also Snowboard: Overcoming Bias in Captioning Models ECCV 2018 Grounding Visual Explanations ECCV 2018 Recasting Gradient-Based Meta-Learning as Hierarchical Bayes ICLR 2018 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence CVPR 2018 Multi-Content GAN for Few-Shot Font Style Transfer CVPR 2018 Zero-Shot Visual Imitation ICLR 2018 Deep Layer Aggregation CVPR 2018 Object Hallucination in Image Captioning EMNLP 2018 Generalized Orderless Pooling Performs Implicit Salient Matching ICCV 2017 Learning to Reason: End-To-End Module Networks for Visual Question Answering ICCV 2017 Gradient-free Policy Architecture Search and Adaptation CORL 2017 Localizing Moments in Video With Natural Language ICCV 2017 Toward Multimodal Image-to-Image Translation NIPS 2017 Adversarial Discriminative Domain Adaptation CVPR 2017 Learning Detection With Diverse Proposals CVPR 2017 Captioning Images With Diverse Objects CVPR 2017 Learning Features by Watching Objects Move CVPR 2017 End-To-End Learning of Driving Models From Large-Scale Video Datasets CVPR 2017 Modeling Relationships in Referential Expressions With Compositional Modular Networks CVPR 2017 Curiosity-driven Exploration by Self-supervised Prediction ICML 2017 Natural Language Object Retrieval CVPR 2016 Context Encoders: Feature Learning by Inpainting CVPR 2016 Learning With Side Information Through Modality Hallucination CVPR 2016 Compact Bilinear Pooling CVPR 2016 Neural Module Networks CVPR 2016 Deep Compositional Captioning: Describing Novel Object Categories Without Paired Training Data CVPR 2016 Learning to Compose Neural Networks for Question Answering NAACL 2016 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding EMNLP 2016 Large Scale Visual Recognition through Adaptation using Joint Representation and Multiple Instance Learning JMLR 2016 End-to-End Training of Deep Visuomotor Policies JMLR 2016 Learning The Structure of Deep Convolutional Networks ICCV 2015 Spatial Semantic Regularisation for Large Scale Object Detection ICCV 2015 Simultaneous Deep Transfer Across Domains and Tasks ICCV 2015 Sequence to Sequence - Video to Text ICCV 2015 Constrained Convolutional Neural Networks for Weakly Supervised Segmentation ICCV 2015 Fully Convolutional Networks for Semantic Segmentation CVPR 2015 Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning CVPR 2015 Long-Term Recurrent Convolutional Networks for Visual Recognition and Description CVPR 2015 Deformable Part Models are Convolutional Neural Networks CVPR 2015 LSDA: Large Scale Detection through Adaptation NIPS 2014 DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition ICML 2014 On learning to localize objects with minimal supervision ICML 2014 Learning Scalable Discriminative Dictionary with Sample Relatedness CVPR 2014 PANDA: Pose Aligned Networks for Deep Attribute Modeling CVPR 2014 Continuous Manifold Based Adaptation for Evolving Visual Domains CVPR 2014 Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation CVPR 2014 Anytime Recognition of Objects and Scenes CVPR 2014 Weakly-supervised Discovery of Visual Pattern Configurations NIPS 2014 Do Convnets Learn Correspondence? NIPS 2014 Open-vocabulary Object Retrieval RSS 2014 Latent Task Adaptation with Large-Scale Hierarchies ICCV 2013 YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition ICCV 2013 Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction ICCV 2013 Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies NIPS 2013 On Compact Codes for Spatially Pooled Features ICML 2013 Discriminatively Activated Sparselets ICML 2013 Semi-supervised Domain Adaptation with Instance Constraints CVPR 2013 Timely Object Recognition NIPS 2012 Learning with Recursive Perceptual Representations NIPS 2012 Heavy-tailed Distances for Gradient Based Image Descriptors NIPS 2011 Factorized Orthogonal Latent Spaces AISTATS 2010 Factorized Latent Spaces with Structured Sparsity NIPS 2010 Size Matters: Metric Visual Search Constraints from Monocular Metadata NIPS 2010 Filtering Abstract Senses From Image Search Results NIPS 2009 An Additive Latent Feature Model for Transparent Object Recognition NIPS 2009 Who is “You”? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue EACL 2009 Learning to Hash with Binary Reconstructive Embeddings NIPS 2009 Unsupervised Learning of Visual Sense Models for Polysemous Words NIPS 2008 The Pyramid Match Kernel: Efficient Learning with Sets of Features JMLR 2007 Approximate Correspondences in High Dimensions NIPS 2006