← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

Embarrassingly Simple Performance Prediction for Abductive Natural Language Inference NAACL 2022

Explaining Dialogue Evaluation Metrics using Adversarial Behavioral Analysis NAACL 2022

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries NAACL 2022

Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack CVPR 2022

ExSum: From Local Explanations to Model Understanding NAACL 2022

On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models? NAACL 2022

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets EMNLP 2022

Finding Dataset Shortcuts with Grammar Induction EMNLP 2022

SLING: Sino Linguistic Evaluation of Large Language Models EMNLP 2022

Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics EMNLP 2022

COPEN: Probing Conceptual Knowledge in Pre-trained Language Models EMNLP 2022

Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models EMNLP 2022

Revisiting Grammatical Error Correction Evaluation and Beyond EMNLP 2022

X-FACTOR: A Cross-metric Evaluation of Factual Correctness in Abstractive Summarization EMNLP 2022

Revisiting DocRED - Addressing the False Negative Problem in Relation Extraction EMNLP 2022

RobustLR: A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners EMNLP 2022

Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference EMNLP 2022

FALTE: A Toolkit for Fine-grained Annotation for Long Text Evaluation EMNLP 2022

SEAL: Interactive Tool for Systematic Error Analysis and Labeling EMNLP 2022

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers EMNLP 2022

Towards Robust NLG Bias Evaluation with Syntactically-diverse Prompts EMNLP 2022

PaCo: Preconditions Attributed to Commonsense Knowledge EMNLP 2022

Execution-based Evaluation for Data Science Code Generation Models EMNLP 2022

A Comparative Analysis between Human-in-the-loop Systems and Large Language Models for Pattern Extraction Tasks EMNLP 2022

Revisiting text decomposition methods for NLI-based factuality scoring of summaries EMNLP 2022