← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

Human perceiving behavior modeling in evaluation of code generation models EMNLP 2022

On reporting scores and agreement for error annotation tasks EMNLP 2022

Answerability: A custom metric for evaluating chatbot performance EMNLP 2022

Error Analysis of ToTTo Table-to-Text Neural NLG Models EMNLP 2022

20Q: Overlap-Free World Knowledge Benchmark for Language Models EMNLP 2022

What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation EMNLP 2022

Numerical Correlation in Text EMNLP 2022

How Language-Dependent is Emotion Detection? Evidence from Multilingual BERT EMNLP 2022

Searching for a Higher Power in the Human Evaluation of MT EMNLP 2022

Test Set Sampling Affects System Rankings: Expanded Human Evaluation of WMT20 English-Inuktitut Systems EMNLP 2022

Linguistically Motivated Evaluation of the 2022 State-of-the-art Machine Translation Systems for Three Language Directions EMNLP 2022

Linguistically Motivated Evaluation of Machine Translation Metrics Based on a Challenge Set EMNLP 2022

Exploring Robustness of Machine Translation Metrics: A Study of Twenty-Two Automatic Metrics in the WMT22 Metric Task EMNLP 2022

This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish NIPS 2022

What are the best Systems? New Perspectives on NLP Benchmarking NIPS 2022

Average Sensitivity of Euclidean k-Clustering NIPS 2022

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models NIPS 2022

Estimating the Arc Length of the Optimal ROC Curve and Lower Bounding the Maximal AUC NIPS 2022

Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm NIPS 2022

Generic Overgeneralization in Pre-trained Language Models COLING 2022

Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering COLING 2022

The Role of Context and Uncertainty in Shallow Discourse Parsing COLING 2022

Open-Domain Dialog Evaluation Using Follow-Ups Likelihood COLING 2022

Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans vs. BERT COLING 2022

Using Natural Sentence Prompts for Understanding Biases in Language Models NAACL 2022