Jordan Lee Boyd-Graber

26 papers · 2023–2026 · 5 conferences · across top CS/AI conferences

Achievements

+6 more ↓

🐝 Cross-Pollinator (4) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (4) 🌈 Renaissance Researcher (8)

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer ⚡ Prolific Year (13) 🗃️ Keyword Collector (137) ❓ The Questioner (6) 💎 Century Club (22)

Conferences

EMNLP (11) ACL (9) NAACL (4) EACL (1) ICLR (1)

Top co-authors

Nishant Balepur (10) Rachel Rudinger (6) Ishani Mondal (6) Shi Feng (5) Yoo Yeon Sung (4) Feng Gu (4) Denis Peskoff (3) Wichayaporn Wongkamjan (3) Matthew Shu (3) Zongxia Li (3)

Research topics

Education (1)

Keywords

question answering (6) large language model (6) benchmark evaluation (3) llm evaluation (2) human-ai interaction (2) multimodal learning (2) strategic reasoning (2) item response theory (2) game theory (2) dialogue system (2) human-ai collaboration (2) multi-agent system (2) question generation (2) data augmentation (1) natural language inference (1) text classification (1) preference alignment (1) machine translation (1) natural language processing (1) direct preference optimization (1)

Papers

Measuring User’s Mental Models of Speech Translation in Human-AI Collaboration ACL 2026 Language Models Don’t Know What You Want: Evaluating Personalization in Deep Research Needs Real Users ACL 2026 BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks ACL 2026 SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity EACL 2026 GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration ACL 2025 Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL ACL 2025 A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users EMNLP 2025 No Questions are Stupid, but some are Poorly Posed: Understanding Poorly-Posed Information-Seeking Questions ACL 2025 Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas ACL 2025 Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above ACL 2025 Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering EMNLP 2025 Group Preference Alignment: Customizing LLM Responses from In-Situ Conversations Only When Needed EMNLP 2025 LLM-as-a-Judge Failures at Automating the Identification of Poor Quality Outputs in Free-Form Texts EMNLP 2025 MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections NAACL 2025 Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness NAACL 2025 Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer? NAACL 2025 Personalized Help for Optimizing Low-Skilled Users’ Strategy NAACL 2025 KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students EMNLP 2024 Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA EMNLP 2024 AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models EMNLP 2024 PEDANTS: Cheap but Effective and Interpretable Answer Equivalence EMNLP 2024 SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement EMNLP 2024 More Victories, Less Cooperation: Assessing Cicero’s Diplomacy Play ACL 2024 You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions EMNLP 2024 A SMART Mnemonic Sounds like “Glue Tonic”: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick EMNLP 2024 Prompting GPT-3 To Be Reliable ICLR 2023