← Optimization & Theory

Machine Learning › Optimization & Theory ›

Evaluation

515 directly classified papers

Papers per year

Papers

Understanding Deep Generative Models With Generalized Empirical Likelihoods CVPR 2023

NICO++: Towards Better Benchmarking for Domain Generalization CVPR 2023

LMentry: A Language Model Benchmark of Elementary Language Tasks ACL 2023

EVALIGN: Visual Evaluation of Translation Alignment Models EACL 2023

SubER - A Metric for Automatic Evaluation of Subtitle Quality ACL 2022

SEM-F1: an Automatic Way for Semantic Evaluation of Multi-Narrative Overlap Summaries at Scale EMNLP 2022

Stop Measuring Calibration When Humans Disagree EMNLP 2022

Exploring the Secrets Behind the Learning Difficulty of Meaning Representations for Semantic Parsing EMNLP 2022

Iterative Stratified Testing and Measurement for Automated Model Updates EMNLP 2022

On the Effectiveness of Automated Metrics for Text Generation Systems EMNLP 2022

Structurally Diverse Sampling for Sample-Efficient Training and Comprehensive Evaluation EMNLP 2022

Data Cartography for Low-Resource Neural Machine Translation EMNLP 2022

Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons ACL 2022

Data Contamination: From Memorization to Exploitation ACL 2022

Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa ACL 2022

Replicability under Near-Perfect Conditions – A Case-Study from Automatic Summarization ACL 2022

Vacillating Human Correlation of SacreBLEU in Unprotected Languages ACL 2022

Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations ACL 2022

Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics ACL 2022

Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE? ACL 2022

ACL Tutorial Proposal: Towards Reproducible Machine Learning Research in Natural Language Processing ACL 2022

A Simulation-Based Evaluation Framework for Interactive AI Systems and Its Application AAAI 2022

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation AAAI 2022

ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications INTERSPEECH 2022

Are reported accuracies in the clinical speech machine learning literature overoptimistic? INTERSPEECH 2022