Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
Do ever larger octopi still amplify reporting biases? Evidence from judgments of typical colour
AACL 2022
Quantified Reproducibility Assessment of NLP Results
ACL 2022
CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation
ACL 2022
Life after BERT: What do Other Muppets Understand about Language?
ACL 2022
Automatic Error Analysis for Document-level Information Extraction
ACL 2022
On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency
ACL 2022
QAConv: Question Answering on Informative Conversations
ACL 2022
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
ACL 2022
Just Rank: Rethinking Evaluation with Word and Sentence Similarities
ACL 2022
Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED
ACL 2022
Predicting Difficulty and Discrimination of Natural Language Questions
ACL 2022
ACL Tutorial Proposal: Towards Reproducible Machine Learning Research in Natural Language Processing
ACL 2022
Analyzing Dynamic Adversarial Training Data in the Limit
ACL 2022
BBQ: A hand-built bias benchmark for question answering
ACL 2022
Factual Consistency of Multilingual Pretrained Language Models
ACL 2022
Probing Factually Grounded Content Transfer with Factual Ablation
ACL 2022
E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning
ACL 2022
On Length Divergence Bias in Textual Matching Models
ACL 2022
Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations
ACL 2022
Psycholinguistic Diagnosis of Language Models’ Commonsense Reasoning
ACL 2022
On the Impact of Noises in Crowd-Sourced Data for Speech Translation
ACL 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
ACL 2022
LSCDiscovery: A shared task on semantic change discovery and detection in Spanish
ACL 2022
Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions
ACL 2022
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
ACL 2022
<
1
…
46
47
48
…
67
>