Anna Rohrbach
44 papers · 2015–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
🌍 Conference Polyglot (11) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (10)
🏃
Academic Marathon
(10)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🌟
Keyword Trendsetter Combo
(6)
🤝
Dynamic Duo
(33)
🔬
Deep Specialist
(12)
🧬
Topic Evolution
🚀
Conference Pioneer
⚡
Prolific Year
(14)
🗃️
Keyword Collector
(182)
💎
Century Club
(43)
❓
The Questioner
(3)
📈
Trend Setter
🔥
Unstoppable
(11)
Conferences
CVPR (12)
ECCV (6)
EMNLP (6)
NIPS (4)
WACV (4)
ACL (3)
ICCV (2)
ICLR (2)
ICML (2)
NAACL (2)
CORL (1)
Top co-authors
Keywords
multimodal learning
(10)
visual grounding
(5)
image captioning
(4)
visual question answering
(4)
action recognition
(4)
video understanding
(4)
few-shot learning
(3)
video description
(3)
vision-language model
(3)
zero-shot learning
(3)
misinformation detection
(3)
attention mechanism
(3)
vision-language navigation
(2)
visual reasoning
(2)
image-text mismatch
(2)
contrastive learning
(2)
multi-modal learning
(2)
image classification
(2)
natural language generation
(2)
object detection
(2)
Papers
VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking
ACL 2026
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
ICML 2025
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts
CVPR 2025
Shape-Guided Diffusion With Inside-Outside Attention
WACV 2024
InFact: A Strong Baseline for Automated Fact-Checking
EMNLP 2024
Simple Token-Level Confidence Improves Caption Correctness
WACV 2024
Using Language to Extend to Unseen Domains
ICLR 2023
More Control for Free! Image Synthesis With Semantic Diffusion Guidance
WACV 2023
Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion
WACV 2023
MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding
CVPR 2023
Exposing the Limits of Video-Text Models through Contrast Sets
NAACL 2022
K-LITE: Learning Transferable Visual Models with External Knowledge
NIPS 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
NIPS 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
ACL 2022
DETReg: Unsupervised Pretraining With Region Priors for Object Detection
CVPR 2022
Object-Region Video Transformers
CVPR 2022
On Guiding Visual Attention With Language Specification
CVPR 2022
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
ECCV 2022
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
ECCV 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
ECCV 2022
G3: Geolocation via Guidebook Grounding
EMNLP 2022
Focus! Relevant and Sufficient Context Selection for News Image Captioning
EMNLP 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
ICLR 2022
Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation
NAACL 2022
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media
EMNLP 2021
CLIP-It! Language-Guided Video Summarization
NIPS 2021
Compositional Video Synthesis with Action Graphs
ICML 2021
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules
CVPR 2020
Identity-Aware Multi-Sentence Video Description
ECCV 2020
Language-Conditioned Graph Networks for Relational Reasoning
ICCV 2019
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation
ACL 2019
Robust Change Captioning
ICCV 2019
Adversarial Inference for Multi-Sentence Video Description
CVPR 2019
Speaker-Follower Models for Vision-and-Language Navigation
NIPS 2018
Object Hallucination in Image Captioning
EMNLP 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
CVPR 2018
Fooling Vision and Language Models Despite Localization and Attention Mechanism
CVPR 2018
Women also Snowboard: Overcoming Bias in Captioning Models
ECCV 2018
Textual Explanations for Self-Driving Vehicles
ECCV 2018
Generating Descriptions With Grounded and Co-Referenced People
CVPR 2017
A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering
CVPR 2017
Gradient-free Policy Architecture Search and Adaptation
CORL 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
EMNLP 2016
A Dataset for Movie Description
CVPR 2015