Devi Parikh
103 papers · 2011–2024 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (15) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (8) π£ Hot Topic Early Bird
π
Renaissance Researcher
(8)
π
Interdisciplinary Bridge
π
Conference Polyglot
(12)
π
Conference Loyalist
(33)
π
Keyword Trendsetter Combo
(16)
π€
Dynamic Duo
(58)
π
Triple Crown
π±
Topic Pioneer
π¬
Deep Specialist
(24)
π§¬
Topic Evolution
π
Keyword Champion
(23)
β‘
Prolific Year
(23)
ποΈ
Keyword Collector
(349)
β
The Questioner
(6)
π
Century Club
(103)
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(12)
Conferences
CVPR (33)
ICCV (16)
ECCV (12)
EMNLP (10)
NIPS (9)
ICLR (5)
NAACL (5)
CORL (4)
ICML (4)
ACL (2)
IJCAI (2)
IJCNLP (1)
Top co-authors
Keywords
visual question answering
(23)
multimodal learning
(11)
scene understanding
(10)
visual dialog
(7)
reinforcement learning
(7)
image captioning
(6)
dialogue system
(6)
visual grounding
(6)
multi-modal learning
(5)
neural network
(5)
image retrieval
(4)
question generation
(4)
transfer learning
(4)
image classification
(4)
convolutional neural network
(4)
object detection
(3)
representation learning
(3)
semantic segmentation
(3)
diffusion model
(3)
zero-shot learning
(3)
Papers
Video Editing via Factorized Diffusion Distillation
ECCV 2024
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
CVPR 2024
Factorizing Text-to-Video Generation by Explicit Image Conditioning
ECCV 2024
Make-A-Video: Text-to-Video Generation without Text-Video Data
ICLR 2023
Text-To-4D Dynamic Scene Generation
ICML 2023
AudioGen: Textually Guided Audio Generation
ICLR 2023
SpaText: Spatio-Textual Representation for Controllable Image Generation
CVPR 2023
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
ICCV 2023
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
ECCV 2022
VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator
ACL 2022
Make-a-Scene: Scene-Based Text-to-Image Generation with Human Priors
ECCV 2022
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
ECCV 2022
Episodic Memory Question Answering
CVPR 2022
Human-Adversarial Visual Question Answering
NIPS 2021
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
NAACL 2021
Creative Sketch Generation
ICLR 2021
Contrast and Classify: Training Robust VQA Models
ICCV 2021
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
CVPR 2021
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
CVPR 2021
Sim-to-Real Transfer for Vision-and-Language Navigation
CORL 2020
Embodied Multimodal Multitask Learning
IJCAI 2020
Where Are You? Localization from Embodied Dialog
EMNLP 2020
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
NIPS 2020
SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions
CVPR 2020
Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents
CORL 2020
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
ECCV 2020
IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL
IJCAI 2020
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
ECCV 2020
Spatially Aware Multimodal Transformers for TextVQA
ECCV 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
ECCV 2020
12-in-1: Multi-Task Vision and Language Representation Learning
CVPR 2020
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
ICLR 2020
Audio Visual Scene-Aware Dialog
CVPR 2019
RUBi: Reducing Unimodal Biases for Visual Question Answering
NIPS 2019
Chasing Ghosts: Instruction Following as Bayesian State Tracking
NIPS 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
NIPS 2019
Cross-channel Communication Networks
NIPS 2019
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
ACL 2019
Cycle-Consistency for Robust Visual Question Answering
CVPR 2019
Embodied Question Answering in Photorealistic Environments With Point Cloud Perception
CVPR 2019
Towards VQA Models That Can Read
CVPR 2019
Improving Generative Visual Dialog by Answering Diverse Questions
EMNLP 2019
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
ICCV 2019
Embodied Amodal Recognition: Learning to Move to Perceive Objects
ICCV 2019
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
ICCV 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
ICCV 2019
Fashion++: Minimal Edits for Outfit Improvement
ICCV 2019
nocaps: novel object captioning at scale
ICCV 2019
Habitat: A Platform for Embodied AI Research
ICCV 2019
Modeling the Long Term Future in Model-Based Reinforcement Learning
ICLR 2019
TarMAC: Targeted Multi-Agent Communication
ICML 2019
Counterfactual Visual Explanations
ICML 2019
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering
ICML 2019
Improving Generative Visual Dialog by Answering Diverse Questions
IJCNLP 2019
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
NAACL 2019
Neural Baby Talk
CVPR 2018
Graph R-CNN for Scene Graph Generation
ECCV 2018
Punny Captions: Witty Wordplay in Image Descriptions
NAACL 2018
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
CORL 2018
Neural Modular Control for Embodied Question Answering
CORL 2018
Do explanations make VQA models more predictable to a human?
EMNLP 2018
Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance
ECCV 2018
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
ECCV 2018
Embodied Question Answering
CVPR 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
CVPR 2018
Deal or No Deal? End-to-End Learning of Negotiation Dialogues
EMNLP 2017
Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization
ICCV 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
NIPS 2017
Visual Dialog
CVPR 2017
ParlAI: A Dialog Research Software Platform
EMNLP 2017
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
CVPR 2017
Counting Everyday Objects in Everyday Scenes
CVPR 2017
Making the v in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
CVPR 2017
Context-Aware Captions From Context-Agnostic Supervision
CVPR 2017
Sound-Word2Vec: Learning Word Representations Grounded in Sounds
EMNLP 2017
Sort Story: Sorting Jumbled Images and Captions into Stories
EMNLP 2016
We Are Humor Beings: Understanding and Predicting Visual Humor
CVPR 2016
Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions
EMNLP 2016
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes
CVPR 2016
Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?
EMNLP 2016
Analyzing the Behavior of Visual Question Answering Models
EMNLP 2016
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
NAACL 2016
Visual Storytelling
NAACL 2016
Hierarchical Question-Image Co-Attention for Visual Question Answering
NIPS 2016
Joint Unsupervised Learning of Deep Representations and Image Clusters
CVPR 2016
Yin and Yang: Balancing and Answering Binary Visual Questions
CVPR 2016
Understanding Image Virality
CVPR 2015
CIDEr: Consensus-Based Image Description Evaluation
CVPR 2015
Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks
CVPR 2015
Image Specificity
CVPR 2015
Learning Common Sense Through Visual Abstraction
ICCV 2015
VQA: Visual Question Answering
ICCV 2015
Predicting Failures of Vision Systems
CVPR 2014
Predicting User Annoyance Using Visual Attributes
CVPR 2014
Attribute Dominance: What Pops Out?
ICCV 2013
Implied Feedback: Learning Nuances of User Behavior in Image Search
ICCV 2013
Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing
ICCV 2013
Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback
CVPR 2013
Learning the Visual Interpretation of Sentences
ICCV 2013
Multi-attribute Queries: To Merge or Not to Merge?
CVPR 2013
Bringing Semantics into Focus Using Visual Abstraction
CVPR 2013
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
CVPR 2013
Understanding the Intrinsic Memorability of Images
NIPS 2011