Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric
ACL 2022
Why only Micro-F1? Class Weighting of Measures for Relation Classification
ACL 2022
A global analysis of metrics used for measuring performance in natural language processing
ACL 2022
Unmasking the Mask – Evaluating Social Biases in Masked Language Models
AAAI 2022
Do Language Models Make Human-like Predictions about the Coreferents of Italian Anaphoric Zero Pronouns?
COLING 2022
Generalized Quantifiers as a Source of Error in Multilingual NLU Benchmarks
NAACL 2022
Partial-input baselines show that NLI models can ignore context, but they don’t.
NAACL 2022
Testing the Ability of Language Models to Interpret Figurative Language
NAACL 2022
Benchmarking Intersectional Biases in NLP
NAACL 2022
Exposing the Limits of Video-Text Models through Contrast Sets
NAACL 2022
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
NAACL 2022
ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models
NAACL 2022
Generalization Analysis on Learning with a Concurrent Verifier
NIPS 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
NIPS 2022
Beyond Static models and test sets: Benchmarking the potential of pre-trained models across tasks and languages
ACL 2022
Pre-trained language models evaluating themselves - A comparative study
ACL 2022
On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets
ACL 2022
First the Worst: Finding Better Gender Translations During Beam Search
ACL 2022
AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark
ACL 2022
Darkness can not drive out darkness: Investigating Bias in Hate SpeechDetection Models
ACL 2022
Rethinking and Refining the Distinct Metric
ACL 2022
An Analysis of Negation in Natural Language Understanding Corpora
ACL 2022
Data Contamination: From Memorization to Exploitation
ACL 2022
SummScreen: A Dataset for Abstractive Screenplay Summarization
ACL 2022
ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments
ACL 2022
<
1
…
47
48
49
…
67
>