Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
Human perceiving behavior modeling in evaluation of code generation models
EMNLP 2022
On reporting scores and agreement for error annotation tasks
EMNLP 2022
Answerability: A custom metric for evaluating chatbot performance
EMNLP 2022
Error Analysis of ToTTo Table-to-Text Neural NLG Models
EMNLP 2022
20Q: Overlap-Free World Knowledge Benchmark for Language Models
EMNLP 2022
What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation
EMNLP 2022
Numerical Correlation in Text
EMNLP 2022
How Language-Dependent is Emotion Detection? Evidence from Multilingual BERT
EMNLP 2022
Searching for a Higher Power in the Human Evaluation of MT
EMNLP 2022
Test Set Sampling Affects System Rankings: Expanded Human Evaluation of WMT20 English-Inuktitut Systems
EMNLP 2022
Linguistically Motivated Evaluation of the 2022 State-of-the-art Machine Translation Systems for Three Language Directions
EMNLP 2022
Linguistically Motivated Evaluation of Machine Translation Metrics Based on a Challenge Set
EMNLP 2022
Exploring Robustness of Machine Translation Metrics: A Study of Twenty-Two Automatic Metrics in the WMT22 Metric Task
EMNLP 2022
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
NIPS 2022
What are the best Systems? New Perspectives on NLP Benchmarking
NIPS 2022
Average Sensitivity of Euclidean k-Clustering
NIPS 2022
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
NIPS 2022
Estimating the Arc Length of the Optimal ROC Curve and Lower Bounding the Maximal AUC
NIPS 2022
Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm
NIPS 2022
Generic Overgeneralization in Pre-trained Language Models
COLING 2022
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering
COLING 2022
The Role of Context and Uncertainty in Shallow Discourse Parsing
COLING 2022
Open-Domain Dialog Evaluation Using Follow-Ups Likelihood
COLING 2022
Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans vs. BERT
COLING 2022
Using Natural Sentence Prompts for Understanding Biases in Language Models
NAACL 2022
<
1
…
50
51
52
…
67
>