Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Optimization & Theory
Machine Learning
›
Optimization & Theory
›
Evaluation
515 directly classified papers
Papers per year
2003: 1
2004: 1
2005: 1
2006: 1
2008: 2
2009: 1
2010: 1
2013: 5
2016: 3
2017: 8
2018: 11
2019: 24
2020: 25
2021: 34
2022: 68
2023: 74
2024: 105
2025: 147
2026: 3
Papers
Do Language Models Make Human-like Predictions about the Coreferents of Italian Anaphoric Zero Pronouns?
COLING 2022
Measuring Robustness for NLP
COLING 2022
Establishing Annotation Quality in Multi-label Annotations
COLING 2022
Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios
COLING 2022
IMPARA: Impact-Based Metric for GEC Using Parallel Data
COLING 2022
Layer or Representation Space: What Makes BERT-based Evaluation Metrics Robust?
COLING 2022
To What Extent Do Natural Language Understanding Datasets Correlate to Logical Reasoning? A Method for Diagnosing Logical Reasoning.
COLING 2022
A global analysis of metrics used for measuring performance in natural language processing
ACL 2022
Clustering Examples in Multi-Dataset Benchmarks with Item Response Theory
ACL 2022
Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective
ACL 2022
On the data requirements of probing
ACL 2022
Toward More Effective Human Evaluation for Machine Translation
ACL 2022
Human evaluation of web-crawled parallel corpora for machine translation
ACL 2022
ILDAE: Instance-Level Difficulty Analysis of Evaluation Data
ACL 2022
Nibbling at the Hard Core of Word Sense Disambiguation
ACL 2022
On reporting scores and agreement for error annotation tasks
EMNLP 2022
An Alignment-based Approach to Text Segmentation Similarity Scoring
EMNLP 2022
Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency
EMNLP 2022
On Measuring the Intrinsic Few-Shot Hardness of Datasets
EMNLP 2022
Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora
EMNLP 2022
Assessing Inter-metric Correlation for Multi-document Summarization Evaluation
EMNLP 2022
Perceptual Quality Dimensions of Machine-Generated Text with a Focus on Machine Translation
ACL 2022
Benchmarking: Past, Present and Future
ACL 2021
Probing Language Models for Understanding of Temporal Expressions
EMNLP 2021
Towards Benchmarking the Utility of Explanations for Model Debugging
NAACL 2021
<
1
…
15
16
17
…
21
>