conftrace_

Devi Parikh

103 papers · 2011–2024 · 12 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+18 more ↓

🗺️ Taxonomy Completionist (15) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (8) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (12) 🏠 Conference Loyalist (33) 🌟 Keyword Trendsetter Combo (16) 🤝 Dynamic Duo (58) 👑 Triple Crown 🌱 Topic Pioneer 🔬 Deep Specialist (24) 🧬 Topic Evolution 🏆 Keyword Champion (23) ⚡ Prolific Year (23) 🗃️ Keyword Collector (349) ❓ The Questioner (6) 💎 Century Club (103) 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (12)

Conferences

CVPR (33) ICCV (16) ECCV (12) EMNLP (10) NIPS (9) ICLR (5) NAACL (5) CORL (4) ICML (4) ACL (2) IJCAI (2) IJCNLP (1)

Top co-authors

Dhruv Batra (58) Stefan Lee (20) Abhishek Das (12) Jiasen Lu (12) Ramakrishna Vedantam (9) Jianwei Yang (8) Marcus Rohrbach (8) Yaniv Taigman (7) Peter Anderson (7) Xinlei Chen (7)

Keywords

visual question answering (23) multimodal learning (11) scene understanding (10) visual dialog (7) reinforcement learning (7) image captioning (6) dialogue system (6) visual grounding (6) multi-modal learning (5) neural network (5) image retrieval (4) question generation (4) transfer learning (4) image classification (4) convolutional neural network (4) object detection (3) representation learning (3) semantic segmentation (3) diffusion model (3) zero-shot learning (3)

Papers

Video Editing via Factorized Diffusion Distillation ECCV 2024 Emu Edit: Precise Image Editing via Recognition and Generation Tasks CVPR 2024 Factorizing Text-to-Video Generation by Explicit Image Conditioning ECCV 2024 Make-A-Video: Text-to-Video Generation without Text-Video Data ICLR 2023 Text-To-4D Dynamic Scene Generation ICML 2023 AudioGen: Textually Guided Audio Generation ICLR 2023 SpaText: Spatio-Textual Representation for Controllable Image Generation CVPR 2023 Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation ICCV 2023 MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration ECCV 2022 VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator ACL 2022 Make-a-Scene: Scene-Based Text-to-Image Generation with Human Priors ECCV 2022 Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer ECCV 2022 Episodic Memory Question Answering CVPR 2022 Human-Adversarial Visual Question Answering NIPS 2021 SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency NAACL 2021 Creative Sketch Generation ICLR 2021 Contrast and Classify: Training Robust VQA Models ICCV 2021 Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs CVPR 2021 KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA CVPR 2021 Sim-to-Real Transfer for Vision-and-Language Navigation CORL 2020 Embodied Multimodal Multitask Learning IJCAI 2020 Where Are You? Localization from Embodied Dialog EMNLP 2020 Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data NIPS 2020 SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions CVPR 2020 Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents CORL 2020 Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline ECCV 2020 IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL IJCAI 2020 Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation ECCV 2020 Spatially Aware Multimodal Transformers for TextVQA ECCV 2020 Improving Vision-and-Language Navigation with Image-Text Pairs from the Web ECCV 2020 12-in-1: Multi-Task Vision and Language Representation Learning CVPR 2020 DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames ICLR 2020 Audio Visual Scene-Aware Dialog CVPR 2019 RUBi: Reducing Unimodal Biases for Visual Question Answering NIPS 2019 Chasing Ghosts: Instruction Following as Bayesian State Tracking NIPS 2019 ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks NIPS 2019 Cross-channel Communication Networks NIPS 2019 CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication ACL 2019 Cycle-Consistency for Robust Visual Question Answering CVPR 2019 Embodied Question Answering in Photorealistic Environments With Point Cloud Perception CVPR 2019 Towards VQA Models That Can Read CVPR 2019 Improving Generative Visual Dialog by Answering Diverse Questions EMNLP 2019 SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation ICCV 2019 Embodied Amodal Recognition: Learning to Move to Perceive Objects ICCV 2019 Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded ICCV 2019 Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment ICCV 2019 Fashion++: Minimal Edits for Outfit Improvement ICCV 2019 nocaps: novel object captioning at scale ICCV 2019 Habitat: A Platform for Embodied AI Research ICCV 2019 Modeling the Long Term Future in Model-Based Reinforcement Learning ICLR 2019 TarMAC: Targeted Multi-Agent Communication ICML 2019 Counterfactual Visual Explanations ICML 2019 Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering ICML 2019 Improving Generative Visual Dialog by Answering Diverse Questions IJCNLP 2019 CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog NAACL 2019 Neural Baby Talk CVPR 2018 Graph R-CNN for Scene Graph Generation ECCV 2018 Punny Captions: Witty Wordplay in Image Descriptions NAACL 2018 Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition CORL 2018 Neural Modular Control for Embodied Question Answering CORL 2018 Do explanations make VQA models more predictable to a human? EMNLP 2018 Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance ECCV 2018 Visual Coreference Resolution in Visual Dialog using Neural Module Networks ECCV 2018 Embodied Question Answering CVPR 2018 Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering CVPR 2018 Deal or No Deal? End-to-End Learning of Negotiation Dialogues EMNLP 2017 Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization ICCV 2017 Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model NIPS 2017 Visual Dialog CVPR 2017 ParlAI: A Dialog Research Software Platform EMNLP 2017 Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning CVPR 2017 Counting Everyday Objects in Everyday Scenes CVPR 2017 Making the v in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering CVPR 2017 Context-Aware Captions From Context-Agnostic Supervision CVPR 2017 Sound-Word2Vec: Learning Word Representations Grounded in Sounds EMNLP 2017 Sort Story: Sorting Jumbled Images and Captions into Stories EMNLP 2016 We Are Humor Beings: Understanding and Predicting Visual Humor CVPR 2016 Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions EMNLP 2016 Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes CVPR 2016 Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? EMNLP 2016 Analyzing the Behavior of Visual Question Answering Models EMNLP 2016 A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories NAACL 2016 Visual Storytelling NAACL 2016 Hierarchical Question-Image Co-Attention for Visual Question Answering NIPS 2016 Joint Unsupervised Learning of Deep Representations and Image Clusters CVPR 2016 Yin and Yang: Balancing and Answering Binary Visual Questions CVPR 2016 Understanding Image Virality CVPR 2015 CIDEr: Consensus-Based Image Description Evaluation CVPR 2015 Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks CVPR 2015 Image Specificity CVPR 2015 Learning Common Sense Through Visual Abstraction ICCV 2015 VQA: Visual Question Answering ICCV 2015 Predicting Failures of Vision Systems CVPR 2014 Predicting User Annoyance Using Visual Attributes CVPR 2014 Attribute Dominance: What Pops Out? ICCV 2013 Implied Feedback: Learning Nuances of User Behavior in Image Search ICCV 2013 Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing ICCV 2013 Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback CVPR 2013 Learning the Visual Interpretation of Sentences ICCV 2013 Multi-attribute Queries: To Merge or Not to Merge? CVPR 2013 Bringing Semantics into Focus Using Visual Abstraction CVPR 2013 Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs CVPR 2013 Understanding the Intrinsic Memorability of Images NIPS 2011