Papers

37 papers found
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta et al.
2025 ICLR
2025 ICLR
2025 IJCNLP
Becoming Experienced Judges: Selective Test-Time Learning for Evaluators
Seungyeon Jwa, Daechul Ahn, Reokyoung Kim et al.
2026 EACL
2026 EACL
Who Judges the Judge? Evaluating LLM-as-a-Judge for French Medical open-ended QA
Ikram Belmadani, Oumaima El Khettari, Pacôme Constant dit Beaufils et al.
2026 EACL