Jack Hessel
47 papers · 2018–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
🌈 Renaissance Researcher (9) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (7) 🌍 Conference Polyglot (10) 🗺️ Taxonomy Completionist (71)
🗺️
Taxonomy Completionist
(71)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🔬
Deep Specialist
(19)
🤝
Dynamic Duo
(27)
👥
Mega-Team
(43)
🗃️
Keyword Collector
(170)
📈
Trend Setter
🔥
Unstoppable
(8)
💎
Century Club
(47)
❓
The Questioner
(7)
⚡
Prolific Year
(15)
Conferences
EMNLP (13)
NAACL (7)
NIPS (7)
ACL (6)
ICLR (5)
CVPR (2)
ECCV (2)
ICCV (2)
IJCNLP (2)
CONLL (1)
Top co-authors
Keywords
multimodal learning
(18)
vision-language model
(6)
large language model
(6)
knowledge distillation
(5)
language model
(5)
visual question answering
(4)
self-supervised learning
(4)
visual reasoning
(4)
knowledge graph
(3)
zero-shot learning
(3)
video understanding
(3)
commonsense reasoning
(3)
image captioning
(3)
visual grounding
(3)
visual commonsense
(3)
instruction tuning
(2)
benchmark evaluation
(2)
data privacy
(2)
instruction following
(2)
language understanding
(2)
Papers
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
ICLR 2025
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
NAACL 2025
CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
ICLR 2025
FunQA: Towards Surprising Video Comprehension
ECCV 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
ICLR 2024
How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models
EMNLP 2024
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild
EMNLP 2024
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
EMNLP 2024
Selective “Selective Prediction”: Reducing Unnecessary Abstention in Vision-Language Reasoning
ACL 2024
The Art of Saying No: Contextual Noncompliance in Language Models
NIPS 2024
OLMo: Accelerating the Science of Language Models
ACL 2024
Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
ACL 2024
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
NAACL 2024
Tailoring Self-Rationalizers with Multi-Reward Distillation
ICLR 2024
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
EMNLP 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
NIPS 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
NIPS 2023
VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models
NIPS 2023
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
NIPS 2023
Do Androids Laugh at Electric Sheep? Humor “Understanding” Benchmarks from The New Yorker Caption Contest
ACL 2023
Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step
ACL 2023
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning
CVPR 2023
Text encoders bottleneck compositionality in contrastive vision-language models
EMNLP 2023
What’s “up” with vision-language models? Investigating their struggle with spatial reasoning
EMNLP 2023
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
EMNLP 2023
NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation
EMNLP 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
ICCV 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
ICCV 2023
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
ICLR 2023
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
ECCV 2022
MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
CVPR 2022
Reframing Human-AI Collaboration for Generating Free-Text Explanations
NAACL 2022
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
NAACL 2022
QUARK: Controllable Text Generation with Reinforced Unlearning
NIPS 2022
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
NAACL 2022
MERLOT: Multimodal Neural Script Knowledge Models
NIPS 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
EMNLP 2021
How effective is BERT without word ordering? Implications for language understanding and data privacy
ACL 2021
How effective is BERT without word ordering? Implications for language understanding and data privacy
IJCNLP 2021
Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents
EMNLP 2020
Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!
EMNLP 2020
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube
EMNLP 2020
Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features
NAACL 2019
A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions
CONLL 2019
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
EMNLP 2019
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
IJCNLP 2019
Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets
NAACL 2018