Papers

5,479 papers found
RiddleBench: A New Generative Reasoning Benchmark for LLMs
Deepon Halder, Alan Saji, Thanmay Jayakumar et al.
2026 EACL
ExpressivityBench: Can LLMs Communicate Implicitly?
Joshua Tint, Som Sagar, Aditya Taparia et al.
2026 EACL
2026 EACL
2026 EACL
Program-of-Thought Reveals LLM Abstraction Ceilings
Mike Zhou, Fenil Bardoliya, Vivek Gupta et al.
2026 EACL
2026 EACL
2026 EACL
DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance
Seffi Cohen, Nurit Cohen Inger, Niv Goldshlager et al.
2026 EACL
Ranking Human and LLM Texts Using Locality Statistics
Yiyang Wang, Chen Ding, Hangfeng He
2026 EACL
2026 EACL
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Albert Sawczyn, Jakub Binkowski, Denis Janiak et al.
2026 EACL
2026 EACL
Better Call CLAUSE: A Discrepancy Benchmark for Auditing LLMs Legal Reasoning Capabilities
Manan Roy Choudhury, Adithya Chandramouli, Mannan Anand et al.
2026 EACL
2026 EACL
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists
Michał Pietruszka, Łukasz Borchmann, Aleksander Jędrosz et al.
2026 EACL
Argument-Based Consistency in Toxicity Explanations of LLMs
Ramaravind Kommiya Mothilal, Joanna Roy, Syed Ishtiaque Ahmed et al.
2026 EACL
Quantifying Data Contamination in Psychometric Evaluations of LLMs
Jongwook Han, Woojung Song, Jonggeun Lee et al.
2026 EACL