Jordan Lee Boyd-Graber
26 papers · 2023–2026 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+6 more ↓ Show less ↑
🐝 Cross-Pollinator (4) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (4) 🌈 Renaissance Researcher (8)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
⚡
Prolific Year
(13)
🗃️
Keyword Collector
(137)
❓
The Questioner
(6)
💎
Century Club
(22)
Conferences
EMNLP (11)
ACL (9)
NAACL (4)
EACL (1)
ICLR (1)
Top co-authors
Research topics
Keywords
question answering
(6)
large language model
(6)
benchmark evaluation
(3)
llm evaluation
(2)
human-ai interaction
(2)
multimodal learning
(2)
strategic reasoning
(2)
item response theory
(2)
game theory
(2)
dialogue system
(2)
human-ai collaboration
(2)
multi-agent system
(2)
question generation
(2)
data augmentation
(1)
natural language inference
(1)
text classification
(1)
preference alignment
(1)
machine translation
(1)
natural language processing
(1)
direct preference optimization
(1)
Papers
Measuring User’s Mental Models of Speech Translation in Human-AI Collaboration
ACL 2026
Language Models Don’t Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
ACL 2026
BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks
ACL 2026
SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity
EACL 2026
GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration
ACL 2025
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
ACL 2025
A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users
EMNLP 2025
No Questions are Stupid, but some are Poorly Posed: Understanding Poorly-Posed Information-Seeking Questions
ACL 2025
Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
ACL 2025
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above
ACL 2025
Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering
EMNLP 2025
Group Preference Alignment: Customizing LLM Responses from In-Situ Conversations Only When Needed
EMNLP 2025
LLM-as-a-Judge Failures at Automating the Identification of Poor Quality Outputs in Free-Form Texts
EMNLP 2025
MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections
NAACL 2025
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness
NAACL 2025
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
NAACL 2025
Personalized Help for Optimizing Low-Skilled Users’ Strategy
NAACL 2025
KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students
EMNLP 2024
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA
EMNLP 2024
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
EMNLP 2024
PEDANTS: Cheap but Effective and Interpretable Answer Equivalence
EMNLP 2024
SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement
EMNLP 2024
More Victories, Less Cooperation: Assessing Cicero’s Diplomacy Play
ACL 2024
You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions
EMNLP 2024
A SMART Mnemonic Sounds like “Glue Tonic”: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
EMNLP 2024
Prompting GPT-3 To Be Reliable
ICLR 2023