Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation
ACL 2020
Designing Precise and Robust Dialogue Response Evaluators
ACL 2020
TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task
ACL 2020
Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings
EMNLP 2020
Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature
EMNLP 2020
Evaluation of Coreference Resolution Systems Under Adversarial Attacks
EMNLP 2020
Analyzing Neural Discourse Coherence Models
EMNLP 2020
Look at the First Sentence: Position Bias in Question Answering
EMNLP 2020
Evaluating the Performance of Reinforcement Learning Algorithms
ICML 2020
Optimization and Analysis of the pAp@k Metric for Recommender Systems
ICML 2020
Minimax Rate for Learning From Pairwise Comparisons in the BTL Model
ICML 2020
Some Languages Seem Easier to Parse Because Their Treebanks Leak
EMNLP 2020
Probing Linguistic Systematicity
ACL 2020
On The Evaluation of Machine Translation Systems Trained With Back-Translation
ACL 2020
On the Inference Calibration of Neural Machine Translation
ACL 2020
On the Robustness of Language Encoders against Grammatical Errors
ACL 2020
Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation
ACL 2020
Adversarial NLI: A New Benchmark for Natural Language Understanding
ACL 2020
Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models
EMNLP 2020
Exposing Shallow Heuristics of Relation Extraction Models with Challenge Data
EMNLP 2020
Stable Regression: On the Power of Optimization over Randomization
JMLR 2020
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
ACL 2020
Facet-Aware Evaluation for Extractive Summarization
ACL 2020
Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates
ACL 2020
Are we Estimating or Guesstimating Translation Quality?
ACL 2020
<
1
…
59
60
61
…
67
>