conftrace_

Papers

BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks ACL 2026 Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers ACL 2026 Measuring User’s Mental Models of Speech Translation in Human-AI Collaboration ACL 2026 Language Models Don’t Know What You Want: Evaluating Personalization in Deep Research Needs Real Users ACL 2026 Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer? NAACL 2025 Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas ACL 2025 Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above ACL 2025 A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users EMNLP 2025 MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections NAACL 2025 A SMART Mnemonic Sounds like “Glue Tonic”: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick EMNLP 2024 Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? ACL 2024 It’s Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning ACL 2024 Is Your Large Language Model Knowledgeable or a Choices-Only Cheater? ACL 2024 KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students EMNLP 2024 Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning EMNLP 2024 DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance ACL 2023 Expository Text Generation: Imitate, Retrieve, Paraphrase EMNLP 2023 Text Fact Transfer EMNLP 2023