Stefan Lee
58 papers · 2015–2025 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π£ Hot Topic Early Bird π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (13)
π
Interdisciplinary Bridge
π
Academic Marathon
(10)
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(5)
π
Grand Slam
π€
Dynamic Duo
(33)
π¬
Deep Specialist
(16)
π
Keyword Champion
(2)
π₯
Unstoppable
(11)
π
Conference Pioneer
ποΈ
Keyword Collector
(212)
β‘
Prolific Year
(10)
β
The Questioner
(4)
π
Century Club
(58)
π
Trend Setter
Conferences
CVPR (10)
ICCV (10)
NIPS (7)
CORL (6)
EMNLP (6)
ECCV (5)
ICML (4)
ICLR (3)
WACV (3)
AAAI (1)
ACL (1)
IJCNLP (1)
NAACL (1)
Top co-authors
Research topics
Keywords
visual question answering
(11)
multimodal learning
(7)
reinforcement learning
(5)
representation learning
(5)
vision-language navigation
(5)
visual navigation
(5)
dialogue system
(4)
question generation
(4)
image captioning
(4)
vision-language model
(4)
embodied question answering
(4)
transfer learning
(3)
agent system
(3)
robot navigation
(3)
imitation learning
(3)
data augmentation
(3)
visual grounding
(3)
embodied ai
(3)
object detection
(3)
instruction following
(3)
Papers
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
ICCV 2025
Harnessing Input-Adaptive Inference for Efficient VLN
ICCV 2025
Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks
WACV 2025
Do Visual Imaginations Improve Vision-and-Language Navigation Agents?
CVPR 2025
Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control
CORL 2025
Viewpoint-Aware Visual Grounding in 3D Scenes
CVPR 2024
Simple Masked Training Strategies Yield Control Policies That Are Robust to Sensor Failure
CORL 2024
Language-Informed Beam Search Decoding for Multilingual Machine Translation
ACL 2024
Benchmarking Out-of-Distribution Detection in Visual Question Answering
WACV 2024
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
CVPR 2024
Navigating to Objects Specified by Images
ICCV 2023
Emergence of Maps in the Memories of Blind Navigation Agents
ICLR 2023
VLSlice: Interactive Vision-and-Language Slice Discovery
ICCV 2023
Iterative Vision-and-Language Navigation
CVPR 2023
Behavioral Analysis of Vision-and-Language Navigation Agents
CVPR 2023
Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments
ECCV 2022
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
NIPS 2021
THDA: Treasure Hunt Data Augmentation for Semantic Navigation
ICCV 2021
Improving Multilingual Translation by Representation and Gradient Regularization
EMNLP 2021
Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views
AAAI 2021
Waypoint Models for Instruction-Guided Navigation in Continuous Environments
ICCV 2021
DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs
ICLR 2021
Auxiliary Tasks for Efficient Learning of Point-Goal Navigation
WACV 2021
Language-Conditioned Imitation Learning for Robot Manipulation Tasks
NIPS 2020
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
ICLR 2020
Sim-to-Real Transfer for Vision-and-Language Navigation
CORL 2020
Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents
CORL 2020
12-in-1: Multi-Task Vision and Language Representation Learning
CVPR 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
ECCV 2020
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
ECCV 2020
Where Are You? Localization from Embodied Dialog
EMNLP 2020
On the Sub-layer Functionalities of Transformer Decoder
EMNLP 2020
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
NIPS 2020
Chasing Ghosts: Instruction Following as Bayesian State Tracking
NIPS 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
NIPS 2019
Audio Visual Scene-Aware Dialog
CVPR 2019
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering
ICML 2019
Trainable Decoding of Sets of Sequences for Neural Sequence Models
ICML 2019
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
ICCV 2019
nocaps: novel object captioning at scale
ICCV 2019
Embodied Question Answering in Photorealistic Environments With Point Cloud Perception
CVPR 2019
Counterfactual Visual Explanations
ICML 2019
Proceedings of the Second Workshop on Shortcomings in Vision and Language
NAACL 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
IJCNLP 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
EMNLP 2019
Neural Modular Control for Embodied Question Answering
CORL 2018
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
CORL 2018
Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance
ECCV 2018
Graph R-CNN for Scene Graph Generation
ECCV 2018
Embodied Question Answering
CVPR 2018
Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations
ICML 2018
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
NIPS 2018
Natural Language Does Not Emerge βNaturallyβ in Multi-Agent Dialog
EMNLP 2017
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
EMNLP 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-In-The-Blank Image Captioning
CVPR 2017
Learning Cooperative Visual Dialog Agents With Deep Reinforcement Learning
ICCV 2017
Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles
NIPS 2016
Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions
ICCV 2015