Roei Herzig

25 papers · 2018–2025 · 8 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (7) 🌍 Conference Polyglot (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (12) 🌍 Conference Polyglot (8) 🤝 Dynamic Duo (18) 🔬 Deep Specialist (10) 🧬 Topic Evolution 🚀 Conference Pioneer 🗃️ Keyword Collector (97) 📈 Trend Setter ❓ The Questioner ⚡ Prolific Year (5) 🔥 Unstoppable (8) 💎 Century Club (25)

Conferences

CVPR (8) NIPS (6) EMNLP (3) ECCV (2) ICML (2) WACV (2) CORL (1) ICCV (1)

Top co-authors

Trevor Darrell (18) Leonid Karlinsky (10) Assaf Arbelle (9) Amir Globerson (9) Gal Chechik (6) Rogerio Feris (6) Amir Bar (5) Anna Rohrbach (4) Sivan Harary (4) Dantong Niu (4)

Keywords

action recognition (5) multimodal learning (4) zero-shot learning (4) few-shot learning (4) vision-language model (4) large multimodal model (4) scene graph (4) video understanding (3) compositional reasoning (3) object detection (3) vision language model (3) self-supervised learning (3) visual reasoning (2) representation learning (2) video transformer (2) transfer learning (1) domain generalization (1) image segmentation (1) semantic segmentation (1) image classification (1)

Papers

Do What? Teaching Vision-Language-Action Models to Reject the Impossible EMNLP 2025 Pre-training Auto-regressive Robotic Models with 4D Representations ICML 2025 Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features ICCV 2025 Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning NIPS 2024 ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs NIPS 2024 LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning CORL 2024 TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering EMNLP 2024 Compositional Chain-of-Thought Prompting for Large Multimodal Models CVPR 2024 Recursive Visual Programming ECCV 2024 Unsupervised Universal Image Segmentation CVPR 2024 PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data WACV 2024 Teaching Structured Vision & Language Concepts to Vision & Language Models CVPR 2023 Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models NIPS 2023 Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs EMNLP 2023 FETA: Towards Specializing Foundational Models for Expert Task Applications NIPS 2022 Unsupervised Domain Generalization by Learning a Bridge Across Domains CVPR 2022 Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens NIPS 2022 DETReg: Unsupervised Pretraining With Region Priors for Object Detection CVPR 2022 Object-Region Video Transformers CVPR 2022 Compositional Video Synthesis with Action Graphs ICML 2021 Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks CVPR 2020 Learning Canonical Representations for Scene Graph to Image Generation ECCV 2020 Differentiable Scene Graphs WACV 2020 Precise Detection in Densely Packed Scenes CVPR 2019 Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction NIPS 2018