Jack Hessel

47 papers · 2018–2025 · 10 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌈 Renaissance Researcher (9) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (7) 🌍 Conference Polyglot (10) 🗺️ Taxonomy Completionist (71)

🗺️ Taxonomy Completionist (71) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🔬 Deep Specialist (19) 🤝 Dynamic Duo (27) 👥 Mega-Team (43) 🗃️ Keyword Collector (170) 📈 Trend Setter 🔥 Unstoppable (8) 💎 Century Club (47) ❓ The Questioner (7) ⚡ Prolific Year (15)

Conferences

EMNLP (13) NAACL (7) NIPS (7) ACL (6) ICLR (5) CVPR (2) ECCV (2) ICCV (2) IJCNLP (2) CONLL (1)

Top co-authors

Yejin Choi (27) Youngjae Yu (12) Ximing Lu (12) Khyathi Chandu (9) Rowan Zellers (6) Lillian Lee (6) Liwei Jiang (5) Peter West (5) Jae Sung Park (5) Jena D. Hwang (4)

Keywords

multimodal learning (18) vision-language model (6) large language model (6) knowledge distillation (5) language model (5) visual question answering (4) self-supervised learning (4) visual reasoning (4) knowledge graph (3) zero-shot learning (3) video understanding (3) commonsense reasoning (3) image captioning (3) visual grounding (3) visual commonsense (3) instruction tuning (2) benchmark evaluation (2) data privacy (2) instruction following (2) language understanding (2)

Papers

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models ICLR 2025 L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects NAACL 2025 CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness ICLR 2025 FunQA: Towards Surprising Video Comprehension ECCV 2024 WildChat: 1M ChatGPT Interaction Logs in the Wild ICLR 2024 How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models EMNLP 2024 WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild EMNLP 2024 Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding EMNLP 2024 Selective “Selective Prediction”: Reducing Unnecessary Abstention in Vision-Language Reasoning ACL 2024 The Art of Saying No: Contextual Noncompliance in Language Models NIPS 2024 OLMo: Accelerating the Science of Language Models ACL 2024 Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models ACL 2024 UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations NAACL 2024 Tailoring Self-Rationalizers with Multi-Reward Distillation ICLR 2024 Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms EMNLP 2023 Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text NIPS 2023 Localized Symbolic Knowledge Distillation for Visual Commonsense Models NIPS 2023 VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models NIPS 2023 How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources NIPS 2023 Do Androids Laugh at Electric Sheep? Humor “Understanding” Benchmarks from The New Yorker Caption Contest ACL 2023 Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step ACL 2023 Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning CVPR 2023 Text encoders bottleneck compositionality in contrastive vision-language models EMNLP 2023 What’s “up” with vision-language models? Investigating their struggle with spatial reasoning EMNLP 2023 SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization EMNLP 2023 NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation EMNLP 2023 CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos ICCV 2023 Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images ICCV 2023 Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization ICLR 2023 The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning ECCV 2022 MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound CVPR 2022 Reframing Human-AI Collaboration for Generating Free-Text Explanations NAACL 2022 Symbolic Knowledge Distillation: from General Language Models to Commonsense Models NAACL 2022 QUARK: Controllable Text Generation with Reinforced Unlearning NIPS 2022 Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer NAACL 2022 MERLOT: Multimodal Neural Script Knowledge Models NIPS 2021 CLIPScore: A Reference-free Evaluation Metric for Image Captioning EMNLP 2021 How effective is BERT without word ordering? Implications for language understanding and data privacy ACL 2021 How effective is BERT without word ordering? Implications for language understanding and data privacy IJCNLP 2021 Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents EMNLP 2020 Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think! EMNLP 2020 Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube EMNLP 2020 Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features NAACL 2019 A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions CONLL 2019 Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents EMNLP 2019 Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents IJCNLP 2019 Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets NAACL 2018