Marcus Rohrbach
59 papers · 2013–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
🌍 Conference Polyglot (11) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (12)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🏃
Academic Marathon
(12)
🌟
Keyword Trendsetter Combo
(9)
🏠
Conference Loyalist
(23)
🤝
Dynamic Duo
(19)
🔬
Deep Specialist
(12)
🧬
Topic Evolution
🏆
Keyword Champion
(6)
🏆
Grand Slam
⚡
Prolific Year
(13)
🗃️
Keyword Collector
(203)
💎
Century Club
(58)
🔥
Unstoppable
(11)
📈
Trend Setter
🚀
Conference Pioneer
Conferences
CVPR (23)
ECCV (9)
ICCV (7)
ICLR (5)
ACL (3)
EMNLP (3)
NAACL (3)
AAAI (2)
ICML (2)
NIPS (1)
WACV (1)
Top co-authors
Keywords
visual question answering
(11)
multimodal learning
(9)
image captioning
(7)
video captioning
(6)
video description
(6)
video understanding
(5)
visual grounding
(4)
transfer learning
(4)
multi-modal learning
(3)
action recognition
(3)
zero-shot learning
(3)
convolutional neural network
(3)
coreference resolution
(2)
question generation
(2)
semantic representation
(2)
natural language generation
(2)
video classification
(2)
scene understanding
(2)
representation learning
(2)
object detection
(2)
Papers
VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking
ACL 2026
Predicting Implicit Arguments in Procedural Video Instructions
ACL 2025
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
ICML 2025
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts
CVPR 2025
InFact: A Strong Baseline for Automated Fact-Checking
EMNLP 2024
Simple Token-Level Confidence Improves Caption Correctness
WACV 2024
Efficient Pre-training for Localized Instruction Generation of Procedural Videos
ECCV 2024
Improving Selective Visual Question Answering by Learning From Your Peers
CVPR 2023
Learning To Recognize Procedural Activities With Distant Supervision
CVPR 2022
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
ECCV 2022
Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition
ECCV 2022
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition
ECCV 2022
FLAVA: A Foundational Language and Vision Alignment Model
CVPR 2022
SMART Frame Selection for Action Recognition
AAAI 2021
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
CVPR 2021
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting
ICLR 2021
12-in-1: Multi-Task Vision and Language Representation Learning
CVPR 2020
In Defense of Grid Features for Visual Question Answering
CVPR 2020
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA
CVPR 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
ECCV 2020
Adversarial Continual Learning
ECCV 2020
Learning to Generate Grounded Visual Captions without Localization Supervision
ECCV 2020
Uncertainty-guided Continual Learning with Bayesian Neural Networks
ICLR 2020
Decoupling Representation and Classifier for Long-Tailed Recognition
ICLR 2020
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition
CVPR 2019
Graph-Based Global Reasoning Networks
CVPR 2019
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
NAACL 2019
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution
ICCV 2019
Selfless Sequential Learning
ICLR 2019
Efficient Lifelong Learning with A-GEM
ICLR 2019
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
ACL 2019
Large-Scale Visual Relationship Understanding
AAAI 2019
Adversarial Inference for Multi-Sentence Video Description
CVPR 2019
Cycle-Consistency for Robust Visual Question Answering
CVPR 2019
Towards VQA Models That Can Read
CVPR 2019
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering
ICML 2019
Grounded Video Description
CVPR 2019
Memory Aware Synapses: Learning what (not) to forget
ECCV 2018
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
ECCV 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
CVPR 2018
A Dataset for Telling the Stories of Social Media Videos
EMNLP 2018
Captioning Images With Diverse Objects
CVPR 2017
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training
ICCV 2017
Modeling Relationships in Referential Expressions With Compositional Modular Networks
CVPR 2017
Generating Descriptions With Grounded and Co-Referenced People
CVPR 2017
Learning to Reason: End-To-End Module Networks for Visual Question Answering
ICCV 2017
Neural Module Networks
CVPR 2016
Deep Compositional Captioning: Describing Novel Object Categories Without Paired Training Data
CVPR 2016
Learning to Compose Neural Networks for Question Answering
NAACL 2016
Natural Language Object Retrieval
CVPR 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
EMNLP 2016
Sequence to Sequence - Video to Text
ICCV 2015
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
NAACL 2015
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
CVPR 2015
Spatial Semantic Regularisation for Large Scale Object Detection
ICCV 2015
A Dataset for Movie Description
CVPR 2015
Ask Your Neurons: A Neural-Based Approach to Answering Questions About Images
ICCV 2015
Translating Video Content to Natural Language Descriptions
ICCV 2013
Transfer Learning in a Transductive Setting
NIPS 2013