Anna Rohrbach

44 papers · 2015–2026 · 11 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (11) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (10)

🏃 Academic Marathon (10) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌟 Keyword Trendsetter Combo (6) 🤝 Dynamic Duo (33) 🔬 Deep Specialist (12) 🧬 Topic Evolution 🚀 Conference Pioneer ⚡ Prolific Year (14) 🗃️ Keyword Collector (182) 💎 Century Club (43) ❓ The Questioner (3) 📈 Trend Setter 🔥 Unstoppable (11)

Conferences

CVPR (12) ECCV (6) EMNLP (6) NIPS (4) WACV (4) ACL (3) ICCV (2) ICLR (2) ICML (2) NAACL (2) CORL (1)

Top co-authors

Trevor Darrell (33) Marcus Rohrbach (11) Kate Saenko (6) Grace Luo (5) Dong Huk Park (5) Roei Herzig (4) Amir Bar (4) Amir Globerson (4) Jae Sung Park (4) Sheng Shen (3)

Keywords

multimodal learning (10) visual grounding (5) image captioning (4) visual question answering (4) action recognition (4) video understanding (4) few-shot learning (3) video description (3) vision-language model (3) zero-shot learning (3) misinformation detection (3) attention mechanism (3) vision-language navigation (2) visual reasoning (2) image-text mismatch (2) contrastive learning (2) multi-modal learning (2) image classification (2) natural language generation (2) object detection (2)

Papers

VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking ACL 2026 DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts ICML 2025 V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts CVPR 2025 Shape-Guided Diffusion With Inside-Outside Attention WACV 2024 InFact: A Strong Baseline for Automated Fact-Checking EMNLP 2024 Simple Token-Level Confidence Improves Caption Correctness WACV 2024 Using Language to Extend to Unseen Domains ICLR 2023 More Control for Free! Image Synthesis With Semantic Diffusion Guidance WACV 2023 Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion WACV 2023 MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding CVPR 2023 Exposing the Limits of Video-Text Models through Contrast Sets NAACL 2022 K-LITE: Learning Transferable Visual Models with External Knowledge NIPS 2022 Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens NIPS 2022 ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension ACL 2022 DETReg: Unsupervised Pretraining With Region Priors for Object Detection CVPR 2022 Object-Region Video Transformers CVPR 2022 On Guiding Visual Attention With Language Specification CVPR 2022 TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency ECCV 2022 Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly ECCV 2022 The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning ECCV 2022 G3: Geolocation via Guidebook Grounding EMNLP 2022 Focus! Relevant and Sufficient Context Selection for News Image Captioning EMNLP 2022 How Much Can CLIP Benefit Vision-and-Language Tasks? ICLR 2022 Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation NAACL 2022 NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media EMNLP 2021 CLIP-It! Language-Guided Video Summarization NIPS 2021 Compositional Video Synthesis with Action Graphs ICML 2021 Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules CVPR 2020 Identity-Aware Multi-Sentence Video Description ECCV 2020 Language-Conditioned Graph Networks for Relational Reasoning ICCV 2019 Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation ACL 2019 Robust Change Captioning ICCV 2019 Adversarial Inference for Multi-Sentence Video Description CVPR 2019 Speaker-Follower Models for Vision-and-Language Navigation NIPS 2018 Object Hallucination in Image Captioning EMNLP 2018 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence CVPR 2018 Fooling Vision and Language Models Despite Localization and Attention Mechanism CVPR 2018 Women also Snowboard: Overcoming Bias in Captioning Models ECCV 2018 Textual Explanations for Self-Driving Vehicles ECCV 2018 Generating Descriptions With Grounded and Co-Referenced People CVPR 2017 A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering CVPR 2017 Gradient-free Policy Architecture Search and Adaptation CORL 2017 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding EMNLP 2016 A Dataset for Movie Description CVPR 2015