Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Optimization & Theory
Machine Learning
›
Optimization & Theory
›
Evaluation
515 directly classified papers
Papers per year
2003: 1
2004: 1
2005: 1
2006: 1
2008: 2
2009: 1
2010: 1
2013: 5
2016: 3
2017: 8
2018: 11
2019: 24
2020: 25
2021: 34
2022: 68
2023: 74
2024: 105
2025: 147
2026: 3
Papers
Understanding Deep Generative Models With Generalized Empirical Likelihoods
CVPR 2023
NICO++: Towards Better Benchmarking for Domain Generalization
CVPR 2023
LMentry: A Language Model Benchmark of Elementary Language Tasks
ACL 2023
EVALIGN: Visual Evaluation of Translation Alignment Models
EACL 2023
SubER - A Metric for Automatic Evaluation of Subtitle Quality
ACL 2022
SEM-F1: an Automatic Way for Semantic Evaluation of Multi-Narrative Overlap Summaries at Scale
EMNLP 2022
Stop Measuring Calibration When Humans Disagree
EMNLP 2022
Exploring the Secrets Behind the Learning Difficulty of Meaning Representations for Semantic Parsing
EMNLP 2022
Iterative Stratified Testing and Measurement for Automated Model Updates
EMNLP 2022
On the Effectiveness of Automated Metrics for Text Generation Systems
EMNLP 2022
Structurally Diverse Sampling for Sample-Efficient Training and Comprehensive Evaluation
EMNLP 2022
Data Cartography for Low-Resource Neural Machine Translation
EMNLP 2022
Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons
ACL 2022
Data Contamination: From Memorization to Exploitation
ACL 2022
Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa
ACL 2022
Replicability under Near-Perfect Conditions – A Case-Study from Automatic Summarization
ACL 2022
Vacillating Human Correlation of SacreBLEU in Unprotected Languages
ACL 2022
Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations
ACL 2022
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
ACL 2022
Revisiting Automatic Evaluation of Extractive Summarization Task: Can We Do Better than ROUGE?
ACL 2022
ACL Tutorial Proposal: Towards Reproducible Machine Learning Research in Natural Language Processing
ACL 2022
A Simulation-Based Evaluation Framework for Interactive AI Systems and Its Application
AAAI 2022
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
AAAI 2022
ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications
INTERSPEECH 2022
Are reported accuracies in the clinical speech machine learning literature overoptimistic?
INTERSPEECH 2022
<
1
…
13
14
15
…
21
>