← Optimization & Theory

Machine Learning › Optimization & Theory ›

Evaluation

515 directly classified papers

Papers per year

Papers

Do Language Models Make Human-like Predictions about the Coreferents of Italian Anaphoric Zero Pronouns? COLING 2022

Measuring Robustness for NLP COLING 2022

Establishing Annotation Quality in Multi-label Annotations COLING 2022

Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios COLING 2022

IMPARA: Impact-Based Metric for GEC Using Parallel Data COLING 2022

Layer or Representation Space: What Makes BERT-based Evaluation Metrics Robust? COLING 2022

To What Extent Do Natural Language Understanding Datasets Correlate to Logical Reasoning? A Method for Diagnosing Logical Reasoning. COLING 2022

A global analysis of metrics used for measuring performance in natural language processing ACL 2022

Clustering Examples in Multi-Dataset Benchmarks with Item Response Theory ACL 2022

Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective ACL 2022

On the data requirements of probing ACL 2022

Toward More Effective Human Evaluation for Machine Translation ACL 2022

Human evaluation of web-crawled parallel corpora for machine translation ACL 2022

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data ACL 2022

Nibbling at the Hard Core of Word Sense Disambiguation ACL 2022

On reporting scores and agreement for error annotation tasks EMNLP 2022

An Alignment-based Approach to Text Segmentation Similarity Scoring EMNLP 2022

Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency EMNLP 2022

On Measuring the Intrinsic Few-Shot Hardness of Datasets EMNLP 2022

Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora EMNLP 2022

Assessing Inter-metric Correlation for Multi-document Summarization Evaluation EMNLP 2022

Perceptual Quality Dimensions of Machine-Generated Text with a Focus on Machine Translation ACL 2022

Benchmarking: Past, Present and Future ACL 2021

Probing Language Models for Understanding of Temporal Expressions EMNLP 2021

Towards Benchmarking the Utility of Explanations for Model Debugging NAACL 2021