Roei Herzig
25 papers · 2018–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
🏃 Academic Marathon (7) 🌍 Conference Polyglot (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12)
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(12)
🌍
Conference Polyglot
(8)
🤝
Dynamic Duo
(18)
🔬
Deep Specialist
(10)
🧬
Topic Evolution
🚀
Conference Pioneer
🗃️
Keyword Collector
(97)
📈
Trend Setter
❓
The Questioner
⚡
Prolific Year
(5)
🔥
Unstoppable
(8)
💎
Century Club
(25)
Conferences
CVPR (8)
NIPS (6)
EMNLP (3)
ECCV (2)
ICML (2)
WACV (2)
CORL (1)
ICCV (1)
Top co-authors
Keywords
action recognition
(5)
multimodal learning
(4)
zero-shot learning
(4)
few-shot learning
(4)
vision-language model
(4)
large multimodal model
(4)
scene graph
(4)
video understanding
(3)
compositional reasoning
(3)
object detection
(3)
vision language model
(3)
self-supervised learning
(3)
visual reasoning
(2)
representation learning
(2)
video transformer
(2)
transfer learning
(1)
domain generalization
(1)
image segmentation
(1)
semantic segmentation
(1)
image classification
(1)
Papers
Do What? Teaching Vision-Language-Action Models to Reject the Impossible
EMNLP 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
ICML 2025
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
ICCV 2025
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
NIPS 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
NIPS 2024
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
CORL 2024
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
EMNLP 2024
Compositional Chain-of-Thought Prompting for Large Multimodal Models
CVPR 2024
Recursive Visual Programming
ECCV 2024
Unsupervised Universal Image Segmentation
CVPR 2024
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data
WACV 2024
Teaching Structured Vision & Language Concepts to Vision & Language Models
CVPR 2023
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
NIPS 2023
Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs
EMNLP 2023
FETA: Towards Specializing Foundational Models for Expert Task Applications
NIPS 2022
Unsupervised Domain Generalization by Learning a Bridge Across Domains
CVPR 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
NIPS 2022
DETReg: Unsupervised Pretraining With Region Priors for Object Detection
CVPR 2022
Object-Region Video Transformers
CVPR 2022
Compositional Video Synthesis with Action Graphs
ICML 2021
Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks
CVPR 2020
Learning Canonical Representations for Scene Graph to Image Generation
ECCV 2020
Differentiable Scene Graphs
WACV 2020
Precise Detection in Densely Packed Scenes
CVPR 2019
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
NIPS 2018