Co-occurring keywords
Papers
DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation
COLING 2025
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs
COLING 2025
BasqBBQ: A QA Benchmark for Assessing Social Biases in LLMs for Basque, a Low-Resource Language
COLING 2025
Evaluating the Quality of Benchmark Datasets for Low-Resource Languages: A Case Study on Turkish
ACL 2025