← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization NAACL 2022

Even the Simplest Baseline Needs Careful Re-investigation: A Case Study on XML-CNN NAACL 2022

BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation NAACL 2022

Quantifying Synthesis and Fusion and their Impact on Machine Translation NAACL 2022

On the Machine Learning of Ethical Judgments from Natural Language NAACL 2022

Measuring Robustness for NLP COLING 2022

Can Transformers Process Recursive Nested Constructions, Like Humans? COLING 2022

Measure and Improve Robustness in NLP Models: A Survey NAACL 2022

DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction NAACL 2022

How Conservative are Language Models? Adapting to the Introduction of Gender-Neutral Pronouns NAACL 2022

Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding NAACL 2022

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction NAACL 2022

On the Diversity and Limits of Human Explanations NAACL 2022

Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations NAACL 2022

On the Robustness of Reading Comprehension Models to Entity Renaming NAACL 2022

SHAP-Based Explanation Methods: A Review for NLP Interpretability COLING 2022

Does BERT Recognize an Agent? Modeling Dowty’s Proto-Roles with Contextual Embeddings COLING 2022

Testing Large Language Models on Compositionality and Inference with Phrase-Level Adjective-Noun Entailment COLING 2022

Contrast Sets for Stativity of English Verbs in Context COLING 2022

QSTS: A Question-Sensitive Text Similarity Measure for Question Generation COLING 2022

BECEL: Benchmark for Consistency Evaluation of Language Models COLING 2022

Establishing Annotation Quality in Multi-label Annotations COLING 2022

Towards Explainable Evaluation of Language Models on the Semantic Similarity of Visual Concepts COLING 2022

Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios COLING 2022

TestAug: A Framework for Augmenting Capability-based NLP Tests COLING 2022