Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues
EMNLP 2021
Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases
ACL 2021
Language Model Evaluation Beyond Perplexity
ACL 2021
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
ACL 2021
Targeting the Benchmark: On Methodology in Current Natural Language Processing Research
ACL 2021
We Need to Consider Disagreement in Evaluation
ACL 2021
How Might We Create Better Benchmarks for Speech Recognition?
ACL 2021
Shades of BLEU, Flavours of Success: The Case of MultiWOZ
ACL 2021
Evaluation Guidelines to Deal with Implicit Phenomena to Assess Factuality in Data-to-Text Generation
ACL 2021
Human-Model Divergence in the Handling of Vagueness
ACL 2021
Evaluation Scheme of Focal Translation for Japanese Partially Amended Statutes
ACL 2021
Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation
ACL 2021
A Comprehensive Assessment of Dialog Evaluation Metrics
EMNLP 2021
GCDF1: A Goal- and Context- Driven F-Score for Evaluating User Models
EMNLP 2021
Counterfactual Matters: Intrinsic Probing For Dialogue State Tracking
EMNLP 2021
Anatomy of OntoGUM—Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms
EMNLP 2021
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?
IJCNLP 2021
Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation
EMNLP 2021
Detecting Post-Edited References and Their Effect on Human Evaluation
EACL 2021
We Need To Talk About Random Splits
EACL 2021
Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR
EACL 2021
Memorization vs. Generalization : Quantifying Data Leakage in NLP Performance Evaluation
EACL 2021
Does She Wink or Does She Nod? A Challenging Benchmark for Evaluating Word Understanding of Language Models
EACL 2021
A Systematic Review of Reproducibility Research in Natural Language Processing
EACL 2021
Evaluating the Evaluation of Diversity in Natural Language Generation
EACL 2021
<
1
…
56
57
58
…
67
>