Papers

5,479 papers found
2026 EACL
Who Judges the Judge? Evaluating LLM-as-a-Judge for French Medical open-ended QA
Ikram Belmadani, Oumaima El Khettari, Pacôme Constant dit Beaufils et al.
2026 EACL
2026 EACL
LLM-as-a-qualitative-judge: automating error analysis in natural language generation
Nadezhda Chirkova, Tunde Oluwaseyi Ajayi, Seth Aycock et al.
2026 EACL
2026 EACL
FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification
Gwok-Waa Wan, SamZaak Wong, Shengchu Su et al.
2026 AAAI
SoMe: A Realistic Benchmark for LLM-based Social Media Agents
Dizhan Xue, Jing Cui, Shengsheng Qian et al.
2026 AAAI
A Theory of Adaptive Scaffolding for LLM-Based Pedagogical Agents
Clayton Cohn, Surya Rayala, Namrata Srivastava et al.
2026 AAAI
Mind the Gap: The Divergence Between Human and LLM-Generated Tasks
Yi-Long Lu, Jiajun Song, Chunhui Zhang et al.
2026 AAAI