Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
Can Edge Probing Tests Reveal Linguistic Knowledge in QA Models?
COLING 2022
Stability of Syntactic Dialect Classification over Space and Time
COLING 2022
Deepchecks: A Library for Testing and Validating Machine Learning Models and Data
JMLR 2022
On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms
AISTATS 2022
An Empirical Study of Pipeline vs. Joint approaches to Entity and Relation Extraction
IJCNLP 2022
Assessing Combinational Generalization of Language Models in Biased Scenarios
IJCNLP 2022
Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET
IJCNLP 2022
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation
IJCNLP 2022
ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics
EMNLP 2022
Automated Evaluation Metric for Terminology Consistency in MT
EMNLP 2022
Continuous Rating as Reliable Human Evaluation of Simultaneous Speech Translation
EMNLP 2022
Findings of the WMT 2022 Shared Task on Quality Estimation
EMNLP 2022
Exploring The Landscape of Distributional Robustness for Question Answering Models
EMNLP 2022
Are Neural Topic Models Broken?
EMNLP 2022
Sarcasm Detection is Way Too Easy! An Empirical Comparison of Human and Machine Sarcasm Detection
EMNLP 2022
EnDex: Evaluation of Dialogue Engagingness at Scale
EMNLP 2022
On the Impact of Temporal Concept Drift on Model Explanations
EMNLP 2022
Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA
EMNLP 2022
Simple but Challenging: Natural Language Inference Models Fail on Simple Sentences
EMNLP 2022
Scientific and Creative Analogies in Pretrained Language Models
EMNLP 2022
A Few More Examples May Be Worth Billions of Parameters
EMNLP 2022
Measuring and Improving Semantic Diversity of Dialogue Generation
EMNLP 2022
Language Models Are Poor Learners of Directional Inference
EMNLP 2022
Impact of Pretraining Term Frequencies on Few-Shot Numerical Reasoning
EMNLP 2022
Machine translation impact in E-commerce multilingual search
EMNLP 2022
<
1
…
52
53
54
…
67
>