Trevor Darrell
246 papers · 2006–2025 · 17 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (38) π§ Keyword Pioneer π Renaissance Researcher (6) π Interdisciplinary Bridge π£ Hot Topic Early Bird
π
Academic Marathon
(19)
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Conference Loyalist
(35)
π
Keyword Trendsetter Combo
(37)
π€
Dynamic Duo
(33)
π
Triple Crown
π±
Topic Pioneer
π
Keyword Champion
π
Grand Slam
π₯
Mega-Team
(24)
π¬
Deep Specialist
(50)
π§¬
Topic Evolution
π
Conference Pioneer
π₯
Unstoppable
(20)
β
The Questioner
(9)
π
Century Club
(246)
ποΈ
Keyword Collector
(130)
β‘
Prolific Year
(27)
π
Trend Setter
Conferences
CVPR (66)
NIPS (35)
ICCV (32)
ICLR (26)
ICML (19)
ECCV (19)
EMNLP (12)
WACV (8)
NAACL (8)
CORL (6)
ACL (6)
JMLR (3)
AAAI (2)
EACL (1)
AISTATS (1)
RSS (1)
UAI (1)
Top co-authors
Research topics
Keywords
multimodal learning
(26)
object detection
(21)
zero-shot learning
(18)
transfer learning
(17)
vision-language model
(17)
convolutional neural network
(17)
few-shot learning
(15)
semantic segmentation
(15)
domain adaptation
(15)
representation learning
(13)
image classification
(11)
self-supervised learning
(11)
diffusion model
(10)
visual reasoning
(9)
image generation
(9)
visual question answering
(9)
semi-supervised learning
(9)
video understanding
(8)
object recognition
(8)
unsupervised learning
(8)
Papers
SegLLM: Multi-round Reasoning Segmentation with Large Language Models
ICLR 2025
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models
ICLR 2025
A Coefficient Makes SVRG Effective
ICLR 2025
Enough Coin Flips Can Make LLMs Act Bayesian
ACL 2025
Visual Imitation Enables Contextual Humanoid Control
CORL 2025
Scaling Vision Pre-Training to 4K Resolution
CVPR 2025
Visual Lexicon: Rich Image Features in Language Space
CVPR 2025
Pose Priors from Language Models
CVPR 2025
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
CVPR 2025
AutoPresent: Designing Structured Visuals from Scratch
CVPR 2025
Discovering Divergent Representations between Text-to-Image Models
ICCV 2025
The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
CORL 2025
Navigation World Models
CVPR 2025
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
ICCV 2025
St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World
ICCV 2025
Describe Anything: Detailed Localized Image and Video Captioning
ICCV 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
ICML 2025
Vision-Language Models Create Cross-Modal Task Representations
ICML 2025
Do What? Teaching Vision-Language-Action Models to Reject the Impossible
EMNLP 2025
Puzzled by Puzzles: When Vision-Language Models Canβt Take a Hint
EMNLP 2025
Dual-Process Image Generation
ICCV 2025
Video Action Differencing
ICLR 2025
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
ICLR 2025
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
ICLR 2025
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
ICLR 2025
Self-correcting LLM-controlled Diffusion Models
CVPR 2024
Recursive Visual Programming
ECCV 2024
Finding Visual Task Vectors
ECCV 2024
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
EMNLP 2024
Simple Token-Level Confidence Improves Caption Correctness
WACV 2024
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data
WACV 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
NIPS 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
NIPS 2024
When does perceptual alignment benefit vision representations?
NIPS 2024
Humanoid Locomotion as Next Token Prediction
NIPS 2024
Segment Anything without Supervision
NIPS 2024
xT: Nested Tokenization for Larger Context in Large Images
ICML 2024
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
CORL 2024
Hyperbolic Active Learning for Semantic Segmentation under Domain Shift
ICML 2024
Aligning Large Multimodal Models with Factually Augmented RLHF
ACL 2024
Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
ICML 2024
Stochastic positional embeddings improve masked image modeling
ICML 2024
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
ICLR 2024
Initializing Models with Larger Ones
ICLR 2024
LLM-grounded Video Diffusion Models
ICLR 2024
Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models
ICLR 2024
Shape-Guided Diffusion With Inside-Outside Attention
WACV 2024
Multitask Vision-Language Prompt Tuning
WACV 2024
Readout Guidance: Learning Control from Diffusion Features
CVPR 2024
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
ECCV 2024
See Say and Segment: Teaching LMMs to Overcome False Premises
CVPR 2024
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
CVPR 2024
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
CVPR 2024
Unsupervised Universal Image Segmentation
CVPR 2024
Describing Differences in Image Sets with Natural Language
CVPR 2024
InstanceDiffusion: Instance-level Control for Image Generation
CVPR 2024
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
CVPR 2024
Sequential Modeling Enables Scalable Learning for Large Vision Models
CVPR 2024
Compositional Chain-of-Thought Prompting for Large Multimodal Models
CVPR 2024
Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction
NAACL 2024
ALOHa: A New Measure for Hallucination in Captioning Models
NAACL 2024
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
NAACL 2024
When Do We Not Need Larger Vision Models?
ECCV 2024
Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation
NIPS 2023
More Control for Free! Image Synthesis With Semantic Diffusion Guidance
WACV 2023
Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion
WACV 2023
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
NIPS 2023
Robot Learning with Sensorimotor Pre-training
CORL 2023
Top-Down Visual Attention From Analysis by Synthesis
CVPR 2023
Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption
CVPR 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
EMNLP 2023
Guiding Pretraining in Reinforcement Learning with Large Language Models
ICML 2023
Dropout Reduces Underfitting
ICML 2023
CLAIR: Evaluating Image Captions with Large Language Models
EMNLP 2023
Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs
EMNLP 2023
Large Language Models are Visual Reasoning Coordinators
NIPS 2023
Modular Visual Question Answering via Code Generation
ACL 2023
Using Language to Extend to Unseen Domains
ICLR 2023
Hierarchical Open-vocabulary Universal Image Segmentation
NIPS 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
EMNLP 2023
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
ICCV 2023
Can Language Models Learn to Listen?
ICCV 2023
Zero-Shot Reward Specification via Grounded Natural Language
ICML 2022
G3: Geolocation via Guidebook Grounding
EMNLP 2022
Anytime Dense Prediction with Confidence Adaptivity
ICLR 2022
Differentiable Gradient Sampling for Learning Implicit 3D Scene Reconstructions from a Single Image
ICLR 2022
Disentangled Action Recognition with Knowledge Bases
NAACL 2022
Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation
NAACL 2022
Strumming to the Beat: Audio-Conditioned Contrastive Video Textures
WACV 2022
DETReg: Unsupervised Pretraining With Region Priors for Object Detection
CVPR 2022
Contrastive Test-Time Adaptation
CVPR 2022
A ConvNet for the 2020s
CVPR 2022
Object-Region Video Transformers
CVPR 2022
Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion
CVPR 2022
On Guiding Visual Attention With Language Specification
CVPR 2022
Self-Supervised Pretraining Improves Self-Supervised Pretraining
WACV 2022
Learning to Detect Every Thing in an Open World
ECCV 2022
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
ECCV 2022
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
ECCV 2022
Real-World Robot Learning with Masked Visual Pre-training
CORL 2022
Un-mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning
AAAI 2022
Exposing the Limits of Video-Text Models through Contrast Sets
NAACL 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
ACL 2022
Voxel-informed Language Grounding
ACL 2022
Studying Bias in GANs through the Lens of Race
ECCV 2022
Visual Attention Emerges from Recurrent Sparse Reconstruction
ICML 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
NIPS 2022
Visual Prompting via Image Inpainting
NIPS 2022
K-LITE: Learning Transferable Visual Models with External Knowledge
NIPS 2022
Early Convolutions Help Transformers See Better
NIPS 2021
Rethinking Preventing Class-Collapsing in Metric Learning With Margin-Based Losses
ICCV 2021
ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation
AAAI 2021
Tent: Fully Test-Time Adaptation by Entropy Minimization
ICLR 2021
Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation
CVPR 2021
SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning
CVPR 2021
Compositional Video Synthesis with Action Graphs
ICML 2021
Predicting With Confidence on Unseen Distributions
ICCV 2021
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media
EMNLP 2021
Modular Networks for Compositional Instruction Following
NAACL 2021
CLIP-It! Language-Guided Video Summarization
NIPS 2021
Teachable Reinforcement Learning via Advice Distillation
NIPS 2021
Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics
CVPR 2021
Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation
CVPR 2021
Tune It the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density
ICCV 2021
Region Similarity Representation Learning
ICCV 2021
Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning
ICCV 2021
Robust Object Detection via Instance-Level Temporal Cycle Confusion
ICCV 2021
Discovering Non-monotonic Autoregressive Orderings with Variational Inference
ICLR 2021
Temporal Action Detection With Multi-Level Supervision
ICCV 2021
What Should Not Be Contrastive in Contrastive Learning
ICLR 2021
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting
ICLR 2021
Quasi-Dense Similarity Learning for Multiple Object Tracking
CVPR 2021
Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control
ICLR 2021
Adversarial Continual Learning
ECCV 2020
Fighting Copycat Agents in Behavioral Cloning from Observation Histories
NIPS 2020
Auxiliary Task Reweighting for Minimum-data Learning
NIPS 2020
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning
CVPR 2020
Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks
CVPR 2020
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules
CVPR 2020
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA
CVPR 2020
Learning Saliency Propagation for Semi-Supervised Instance Segmentation
CVPR 2020
Hierarchical Style-based Networks for Motion Synthesis
ECCV 2020
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
ECCV 2020
Identity-Aware Multi-Sentence Video Description
ECCV 2020
Learning Canonical Representations for Scene Graph to Image Generation
ECCV 2020
Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning
ECCV 2020
Uncertainty-guided Continual Learning with Bayesian Neural Networks
ICLR 2020
Frustratingly Simple Few-Shot Object Detection
ICML 2020
Video Prediction via Example Guidance
ICML 2020
Discriminator Rejection Sampling
ICLR 2019
Large-Scale Study of Curiosity-Driven Learning
ICLR 2019
Compositional Plan Vectors
NIPS 2019
Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity
NIPS 2019
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
ACL 2019
Robust Change Captioning
ICCV 2019
Joint Monocular 3D Vehicle Detection and Tracking
ICCV 2019
Variational Adversarial Active Learning
ICCV 2019
Semi-Supervised Domain Adaptation via Minimax Entropy
ICCV 2019
Few-Shot Object Detection via Feature Reweighting
ICCV 2019
Disentangling Propagation and Generation for Video Prediction
ICCV 2019
Language-Conditioned Graph Networks for Relational Reasoning
ICCV 2019
Deep Mixture of Experts via Shallow Embedding
UAI 2019
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
CVPR 2019
Adversarial Inference for Multi-Sentence Video Description
CVPR 2019
Hierarchical Discrete Distribution Decomposition for Match Density Estimation
CVPR 2019
TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning
CVPR 2019
Rethinking the Value of Network Pruning
ICLR 2019
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
ICLR 2019
Fooling Vision and Language Models Despite Localization and Attention Mechanism
CVPR 2018
Learning to Segment Every Thing
CVPR 2018
Explainable Neural Computation via Stack Neural Module Networks
ECCV 2018
SkipNet: Learning Dynamic Routing in Convolutional Networks
ECCV 2018
Localizing Moments in Video with Temporal Language
EMNLP 2018
Speaker-Follower Models for Vision-and-Language Navigation
NIPS 2018
CyCADA: Cycle-Consistent Adversarial Domain Adaptation
ICML 2018
Textual Explanations for Self-Driving Vehicles
ECCV 2018
Women also Snowboard: Overcoming Bias in Captioning Models
ECCV 2018
Grounding Visual Explanations
ECCV 2018
Recasting Gradient-Based Meta-Learning as Hierarchical Bayes
ICLR 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
CVPR 2018
Multi-Content GAN for Few-Shot Font Style Transfer
CVPR 2018
Zero-Shot Visual Imitation
ICLR 2018
Deep Layer Aggregation
CVPR 2018
Object Hallucination in Image Captioning
EMNLP 2018
Generalized Orderless Pooling Performs Implicit Salient Matching
ICCV 2017
Learning to Reason: End-To-End Module Networks for Visual Question Answering
ICCV 2017
Gradient-free Policy Architecture Search and Adaptation
CORL 2017
Localizing Moments in Video With Natural Language
ICCV 2017
Toward Multimodal Image-to-Image Translation
NIPS 2017
Adversarial Discriminative Domain Adaptation
CVPR 2017
Learning Detection With Diverse Proposals
CVPR 2017
Captioning Images With Diverse Objects
CVPR 2017
Learning Features by Watching Objects Move
CVPR 2017
End-To-End Learning of Driving Models From Large-Scale Video Datasets
CVPR 2017
Modeling Relationships in Referential Expressions With Compositional Modular Networks
CVPR 2017
Curiosity-driven Exploration by Self-supervised Prediction
ICML 2017
Natural Language Object Retrieval
CVPR 2016
Context Encoders: Feature Learning by Inpainting
CVPR 2016
Learning With Side Information Through Modality Hallucination
CVPR 2016
Compact Bilinear Pooling
CVPR 2016
Neural Module Networks
CVPR 2016
Deep Compositional Captioning: Describing Novel Object Categories Without Paired Training Data
CVPR 2016
Learning to Compose Neural Networks for Question Answering
NAACL 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
EMNLP 2016
Large Scale Visual Recognition through Adaptation using Joint Representation and Multiple Instance Learning
JMLR 2016
End-to-End Training of Deep Visuomotor Policies
JMLR 2016
Learning The Structure of Deep Convolutional Networks
ICCV 2015
Spatial Semantic Regularisation for Large Scale Object Detection
ICCV 2015
Simultaneous Deep Transfer Across Domains and Tasks
ICCV 2015
Sequence to Sequence - Video to Text
ICCV 2015
Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
ICCV 2015
Fully Convolutional Networks for Semantic Segmentation
CVPR 2015
Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning
CVPR 2015
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
CVPR 2015
Deformable Part Models are Convolutional Neural Networks
CVPR 2015
LSDA: Large Scale Detection through Adaptation
NIPS 2014
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
ICML 2014
On learning to localize objects with minimal supervision
ICML 2014
Learning Scalable Discriminative Dictionary with Sample Relatedness
CVPR 2014
PANDA: Pose Aligned Networks for Deep Attribute Modeling
CVPR 2014
Continuous Manifold Based Adaptation for Evolving Visual Domains
CVPR 2014
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
CVPR 2014
Anytime Recognition of Objects and Scenes
CVPR 2014
Weakly-supervised Discovery of Visual Pattern Configurations
NIPS 2014
Do Convnets Learn Correspondence?
NIPS 2014
Open-vocabulary Object Retrieval
RSS 2014
Latent Task Adaptation with Large-Scale Hierarchies
ICCV 2013
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition
ICCV 2013
Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
ICCV 2013
Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
NIPS 2013
On Compact Codes for Spatially Pooled Features
ICML 2013
Discriminatively Activated Sparselets
ICML 2013
Semi-supervised Domain Adaptation with Instance Constraints
CVPR 2013
Timely Object Recognition
NIPS 2012
Learning with Recursive Perceptual Representations
NIPS 2012
Heavy-tailed Distances for Gradient Based Image Descriptors
NIPS 2011
Factorized Orthogonal Latent Spaces
AISTATS 2010
Factorized Latent Spaces with Structured Sparsity
NIPS 2010
Size Matters: Metric Visual Search Constraints from Monocular Metadata
NIPS 2010
Filtering Abstract Senses From Image Search Results
NIPS 2009
An Additive Latent Feature Model for Transparent Object Recognition
NIPS 2009
Who is βYouβ? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue
EACL 2009
Learning to Hash with Binary Reconstructive Embeddings
NIPS 2009
Unsupervised Learning of Visual Sense Models for Polysemous Words
NIPS 2008
The Pyramid Match Kernel: Efficient Learning with Sets of Features
JMLR 2007
Approximate Correspondences in High Dimensions
NIPS 2006