Marcus Rohrbach

59 papers · 2013–2026 · 11 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🌍 Conference Polyglot (11) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (12)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (12) 🌟 Keyword Trendsetter Combo (9) 🏠 Conference Loyalist (23) 🤝 Dynamic Duo (19) 🔬 Deep Specialist (12) 🧬 Topic Evolution 🏆 Keyword Champion (6) 🏆 Grand Slam ⚡ Prolific Year (13) 🗃️ Keyword Collector (203) 💎 Century Club (58) 🔥 Unstoppable (11) 📈 Trend Setter 🚀 Conference Pioneer

Conferences

CVPR (23) ECCV (9) ICCV (7) ICLR (5) ACL (3) EMNLP (3) NAACL (3) AAAI (2) ICML (2) NIPS (1) WACV (1)

Top co-authors

Trevor Darrell (19) Anna Rohrbach (11) Kate Saenko (9) Devi Parikh (8) Xinlei Chen (7) Yannis Kalantidis (7) Ronghang Hu (7) Bernt Schiele (6) Laura Sevilla-Lara (6) Lisa Anne Hendricks (5)

Keywords

visual question answering (11) multimodal learning (9) image captioning (7) video captioning (6) video description (6) video understanding (5) visual grounding (4) transfer learning (4) multi-modal learning (3) action recognition (3) zero-shot learning (3) convolutional neural network (3) coreference resolution (2) question generation (2) semantic representation (2) natural language generation (2) video classification (2) scene understanding (2) representation learning (2) object detection (2)

Papers

VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking ACL 2026 Predicting Implicit Arguments in Procedural Video Instructions ACL 2025 DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts ICML 2025 V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts CVPR 2025 InFact: A Strong Baseline for Automated Fact-Checking EMNLP 2024 Simple Token-Level Confidence Improves Caption Correctness WACV 2024 Efficient Pre-training for Localized Instruction Generation of Procedural Videos ECCV 2024 Improving Selective Visual Question Answering by Learning From Your Peers CVPR 2023 Learning To Recognize Procedural Activities With Distant Supervision CVPR 2022 Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly ECCV 2022 Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition ECCV 2022 CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition ECCV 2022 FLAVA: A Foundational Language and Vision Alignment Model CVPR 2022 SMART Frame Selection for Action Recognition AAAI 2021 KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA CVPR 2021 Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting ICLR 2021 12-in-1: Multi-Task Vision and Language Representation Learning CVPR 2020 In Defense of Grid Features for Visual Question Answering CVPR 2020 Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA CVPR 2020 TextCaps: a Dataset for Image Captioning with Reading Comprehension ECCV 2020 Adversarial Continual Learning ECCV 2020 Learning to Generate Grounded Visual Captions without Localization Supervision ECCV 2020 Uncertainty-guided Continual Learning with Bayesian Neural Networks ICLR 2020 Decoupling Representation and Classifier for Long-Tailed Recognition ICLR 2020 DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition CVPR 2019 Graph-Based Global Reasoning Networks CVPR 2019 CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog NAACL 2019 Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution ICCV 2019 Selfless Sequential Learning ICLR 2019 Efficient Lifelong Learning with A-GEM ICLR 2019 CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication ACL 2019 Large-Scale Visual Relationship Understanding AAAI 2019 Adversarial Inference for Multi-Sentence Video Description CVPR 2019 Cycle-Consistency for Robust Visual Question Answering CVPR 2019 Towards VQA Models That Can Read CVPR 2019 Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering ICML 2019 Grounded Video Description CVPR 2019 Memory Aware Synapses: Learning what (not) to forget ECCV 2018 Visual Coreference Resolution in Visual Dialog using Neural Module Networks ECCV 2018 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence CVPR 2018 A Dataset for Telling the Stories of Social Media Videos EMNLP 2018 Captioning Images With Diverse Objects CVPR 2017 Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training ICCV 2017 Modeling Relationships in Referential Expressions With Compositional Modular Networks CVPR 2017 Generating Descriptions With Grounded and Co-Referenced People CVPR 2017 Learning to Reason: End-To-End Module Networks for Visual Question Answering ICCV 2017 Neural Module Networks CVPR 2016 Deep Compositional Captioning: Describing Novel Object Categories Without Paired Training Data CVPR 2016 Learning to Compose Neural Networks for Question Answering NAACL 2016 Natural Language Object Retrieval CVPR 2016 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding EMNLP 2016 Sequence to Sequence - Video to Text ICCV 2015 Translating Videos to Natural Language Using Deep Recurrent Neural Networks NAACL 2015 Long-Term Recurrent Convolutional Networks for Visual Recognition and Description CVPR 2015 Spatial Semantic Regularisation for Large Scale Object Detection ICCV 2015 A Dataset for Movie Description CVPR 2015 Ask Your Neurons: A Neural-Based Approach to Answering Questions About Images ICCV 2015 Translating Video Content to Natural Language Descriptions ICCV 2013 Transfer Learning in a Transductive Setting NIPS 2013