← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour AACL 2022

Quantified Reproducibility Assessment of NLP Results ACL 2022

CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation ACL 2022

Life after BERT: What do Other Muppets Understand about Language? ACL 2022

Automatic Error Analysis for Document-level Information Extraction ACL 2022

On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency ACL 2022

QAConv: Question Answering on Informative Conversations ACL 2022

Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation ACL 2022

Just Rank: Rethinking Evaluation with Word and Sentence Similarities ACL 2022

Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED ACL 2022

Predicting Difficulty and Discrimination of Natural Language Questions ACL 2022

ACL Tutorial Proposal: Towards Reproducible Machine Learning Research in Natural Language Processing ACL 2022

Analyzing Dynamic Adversarial Training Data in the Limit ACL 2022

BBQ: A hand-built bias benchmark for question answering ACL 2022

Factual Consistency of Multilingual Pretrained Language Models ACL 2022

Probing Factually Grounded Content Transfer with Factual Ablation ACL 2022

E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning ACL 2022

On Length Divergence Bias in Textual Matching Models ACL 2022

Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations ACL 2022

Psycholinguistic Diagnosis of Language Models’ Commonsense Reasoning ACL 2022

On the Impact of Noises in Crowd-Sourced Data for Speech Translation ACL 2022

Efficient yet Competitive Speech Translation: FBK@IWSLT2022 ACL 2022

LSCDiscovery: A shared task on semantic change discovery and detection in Spanish ACL 2022

Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions ACL 2022

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents ACL 2022