Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
Embarrassingly Simple Performance Prediction for Abductive Natural Language Inference
NAACL 2022
Explaining Dialogue Evaluation Metrics using Adversarial Behavioral Analysis
NAACL 2022
Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries
NAACL 2022
Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack
CVPR 2022
ExSum: From Local Explanations to Model Understanding
NAACL 2022
On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
NAACL 2022
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
EMNLP 2022
Finding Dataset Shortcuts with Grammar Induction
EMNLP 2022
SLING: Sino Linguistic Evaluation of Large Language Models
EMNLP 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics
EMNLP 2022
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models
EMNLP 2022
Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models
EMNLP 2022
Revisiting Grammatical Error Correction Evaluation and Beyond
EMNLP 2022
X-FACTOR: A Cross-metric Evaluation of Factual Correctness in Abstractive Summarization
EMNLP 2022
Revisiting DocRED - Addressing the False Negative Problem in Relation Extraction
EMNLP 2022
RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners
EMNLP 2022
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference
EMNLP 2022
FALTE: A Toolkit for Fine-grained Annotation for Long Text Evaluation
EMNLP 2022
SEAL: Interactive Tool for Systematic Error Analysis and Labeling
EMNLP 2022
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
EMNLP 2022
Towards Robust NLG Bias Evaluation with Syntactically-diverse Prompts
EMNLP 2022
PaCo: Preconditions Attributed to Commonsense Knowledge
EMNLP 2022
Execution-based Evaluation for Data Science Code Generation Models
EMNLP 2022
A Comparative Analysis between Human-in-the-loop Systems and Large Language Models for Pattern Extraction Tasks
EMNLP 2022
Revisiting text decomposition methods for NLI-based factuality scoring of summaries
EMNLP 2022
<
1
…
49
50
51
…
67
>