Yonatan Bitton

26 papers · 2021–2025 · 10 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🐝 Cross-Pollinator (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (10) 🌈 Renaissance Researcher (5)

🌍 Conference Polyglot (10) 🌈 Renaissance Researcher (5) 👥 Mega-Team (60) 🔬 Deep Specialist (10) 🔥 Unstoppable (5) ⚡ Prolific Year (9) 💎 Century Club (26) 🗃️ Keyword Collector (103) ❓ The Questioner

Conferences

NIPS (6) EMNLP (5) ACL (3) ICLR (3) NAACL (3) ECCV (2) AAAI (1) CVPR (1) ICCV (1) WACV (1)

Top co-authors

Idan Szpektor (9) Gabriel Stanovsky (5) Ron Yosef (5) Roy Schwartz (5) Ludwig Schmidt (4) Hritik Bansal (4) Michal Yarom (4) Nitzan Bitton Guetta (3) Roopal Garg (3) Dani Lischinski (3)

Keywords

multimodal learning (7) vision-language model (5) visual question answering (5) large language model (3) text-to-image generation (3) zero-shot learning (3) benchmark evaluation (3) contrastive learning (2) data augmentation (2) automatic evaluation (2) vision language model (2) clip model (2) instruction following (2) image captioning (2) benchmark dataset (2) data filtering (2) commonsense knowledge (1) question answering (1) scene understanding (1) knowledge distillation (1)

Papers

Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions NAACL 2025 EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits ACL 2025 Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision ICLR 2025 NL-Eye: Abductive NLI For Images ICLR 2025 VideoPhy: Evaluating Physical Commonsense for Video Generation ICLR 2025 Contrastive Sequential-Diffusion Learning: Non-Linear and Multi-Scene Instructional Video Synthesis WACV 2025 RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation EMNLP 2025 DOCCI: Descriptions of Connected and Contrasting Images ECCV 2024 Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment ECCV 2024 DataComp-LM: In search of the next generation of training sets for language models NIPS 2024 Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models NIPS 2024 A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains ACL 2024 Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks ACL 2024 VideoCon: Robust Video-Language Alignment via Contrast Captions CVPR 2024 ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies NAACL 2024 ImageInWords: Unlocking Hyper-Detailed Image Descriptions EMNLP 2024 What You See is What You Read? Improving Text-Image Alignment Evaluation NIPS 2023 q2d: Turning Questions into Dialogs to Teach Models How to Search EMNLP 2023 Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images ICCV 2023 VASR: Visual Analogies of Situation Recognition AAAI 2023 DataComp: In search of the next generation of multimodal datasets NIPS 2023 VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models NIPS 2023 IRFL: Image Recognition of Figurative Language EMNLP 2023 WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models NIPS 2022 Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA NAACL 2021 Data Efficient Masked Language Modeling for Vision and Language EMNLP 2021