Co-occurring keywords
Papers
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark
EMNLP 2023
CompleQA: Benchmarking the Impacts of Knowledge Graph Completion Methods on Question Answering
EMNLP 2023
MTEB: Massive Text Embedding Benchmark
EACL 2023
The OPUS-MT Dashboard – A Toolkit for a Systematic Evaluation of Open Machine Translation Models
ACL 2023
Rogue Scores
ACL 2023