Yonatan Bitton
26 papers · 2021–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
🐝 Cross-Pollinator (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (10) 🌈 Renaissance Researcher (5)
🌍
Conference Polyglot
(10)
🌈
Renaissance Researcher
(5)
👥
Mega-Team
(60)
🔬
Deep Specialist
(10)
🔥
Unstoppable
(5)
⚡
Prolific Year
(9)
💎
Century Club
(26)
🗃️
Keyword Collector
(103)
❓
The Questioner
Conferences
NIPS (6)
EMNLP (5)
ACL (3)
ICLR (3)
NAACL (3)
ECCV (2)
AAAI (1)
CVPR (1)
ICCV (1)
WACV (1)
Top co-authors
Keywords
multimodal learning
(7)
vision-language model
(5)
visual question answering
(5)
large language model
(3)
text-to-image generation
(3)
zero-shot learning
(3)
benchmark evaluation
(3)
contrastive learning
(2)
data augmentation
(2)
automatic evaluation
(2)
vision language model
(2)
clip model
(2)
instruction following
(2)
image captioning
(2)
benchmark dataset
(2)
data filtering
(2)
commonsense knowledge
(1)
question answering
(1)
scene understanding
(1)
knowledge distillation
(1)
Papers
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
NAACL 2025
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
ACL 2025
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
ICLR 2025
NL-Eye: Abductive NLI For Images
ICLR 2025
VideoPhy: Evaluating Physical Commonsense for Video Generation
ICLR 2025
Contrastive Sequential-Diffusion Learning: Non-Linear and Multi-Scene Instructional Video Synthesis
WACV 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
EMNLP 2025
DOCCI: Descriptions of Connected and Contrasting Images
ECCV 2024
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
ECCV 2024
DataComp-LM: In search of the next generation of training sets for language models
NIPS 2024
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
NIPS 2024
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
ACL 2024
Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks
ACL 2024
VideoCon: Robust Video-Language Alignment via Contrast Captions
CVPR 2024
ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
NAACL 2024
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
EMNLP 2024
What You See is What You Read? Improving Text-Image Alignment Evaluation
NIPS 2023
q2d: Turning Questions into Dialogs to Teach Models How to Search
EMNLP 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
ICCV 2023
VASR: Visual Analogies of Situation Recognition
AAAI 2023
DataComp: In search of the next generation of multimodal datasets
NIPS 2023
VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models
NIPS 2023
IRFL: Image Recognition of Figurative Language
EMNLP 2023
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
NIPS 2022
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
NAACL 2021
Data Efficient Masked Language Modeling for Vision and Language
EMNLP 2021