← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues EMNLP 2021

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases ACL 2021

Language Model Evaluation Beyond Perplexity ACL 2021

OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics ACL 2021

Targeting the Benchmark: On Methodology in Current Natural Language Processing Research ACL 2021

We Need to Consider Disagreement in Evaluation ACL 2021

How Might We Create Better Benchmarks for Speech Recognition? ACL 2021

Shades of BLEU, Flavours of Success: The Case of MultiWOZ ACL 2021

Evaluation Guidelines to Deal with Implicit Phenomena to Assess Factuality in Data-to-Text Generation ACL 2021

Human-Model Divergence in the Handling of Vagueness ACL 2021

Evaluation Scheme of Focal Translation for Japanese Partially Amended Statutes ACL 2021

Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation ACL 2021

A Comprehensive Assessment of Dialog Evaluation Metrics EMNLP 2021

GCDF1: A Goal- and Context- Driven F-Score for Evaluating User Models EMNLP 2021

Counterfactual Matters: Intrinsic Probing For Dialogue State Tracking EMNLP 2021

Anatomy of OntoGUM—Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms EMNLP 2021

What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks? IJCNLP 2021

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation EMNLP 2021

Detecting Post-Edited References and Their Effect on Human Evaluation EACL 2021

We Need To Talk About Random Splits EACL 2021

Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR EACL 2021

Memorization vs. Generalization : Quantifying Data Leakage in NLP Performance Evaluation EACL 2021

Does She Wink or Does She Nod? A Challenging Benchmark for Evaluating Word Understanding of Language Models EACL 2021

A Systematic Review of Reproducibility Research in Natural Language Processing EACL 2021

Evaluating the Evaluation of Diversity in Natural Language Generation EACL 2021