Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization
NAACL 2022
Even the Simplest Baseline Needs Careful Re-investigation: A Case Study on XML-CNN
NAACL 2022
BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation
NAACL 2022
Quantifying Synthesis and Fusion and their Impact on Machine Translation
NAACL 2022
On the Machine Learning of Ethical Judgments from Natural Language
NAACL 2022
Measuring Robustness for NLP
COLING 2022
Can Transformers Process Recursive Nested Constructions, Like Humans?
COLING 2022
Measure and Improve Robustness in NLP Models: A Survey
NAACL 2022
DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction
NAACL 2022
How Conservative are Language Models? Adapting to the Introduction of Gender-Neutral Pronouns
NAACL 2022
Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding
NAACL 2022
MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction
NAACL 2022
On the Diversity and Limits of Human Explanations
NAACL 2022
Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations
NAACL 2022
On the Robustness of Reading Comprehension Models to Entity Renaming
NAACL 2022
SHAP-Based Explanation Methods: A Review for NLP Interpretability
COLING 2022
Does BERT Recognize an Agent? Modeling Dowty’s Proto-Roles with Contextual Embeddings
COLING 2022
Testing Large Language Models on Compositionality and Inference with Phrase-Level Adjective-Noun Entailment
COLING 2022
Contrast Sets for Stativity of English Verbs in Context
COLING 2022
QSTS: A Question-Sensitive Text Similarity Measure for Question Generation
COLING 2022
BECEL: Benchmark for Consistency Evaluation of Language Models
COLING 2022
Establishing Annotation Quality in Multi-label Annotations
COLING 2022
Towards Explainable Evaluation of Language Models on the Semantic Similarity of Visual Concepts
COLING 2022
Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios
COLING 2022
TestAug: A Framework for Augmenting Capability-based NLP Tests
COLING 2022
<
1
…
51
52
53
…
67
>