← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation ACL 2020

Designing Precise and Robust Dialogue Response Evaluators ACL 2020

TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task ACL 2020

Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings EMNLP 2020

Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature EMNLP 2020

Evaluation of Coreference Resolution Systems Under Adversarial Attacks EMNLP 2020

Analyzing Neural Discourse Coherence Models EMNLP 2020

Look at the First Sentence: Position Bias in Question Answering EMNLP 2020

Evaluating the Performance of Reinforcement Learning Algorithms ICML 2020

Optimization and Analysis of the pAp@k Metric for Recommender Systems ICML 2020

Minimax Rate for Learning From Pairwise Comparisons in the BTL Model ICML 2020

Some Languages Seem Easier to Parse Because Their Treebanks Leak EMNLP 2020

Probing Linguistic Systematicity ACL 2020

On The Evaluation of Machine Translation Systems Trained With Back-Translation ACL 2020

On the Inference Calibration of Neural Machine Translation ACL 2020

On the Robustness of Language Encoders against Grammatical Errors ACL 2020

Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation ACL 2020

Adversarial NLI: A New Benchmark for Natural Language Understanding ACL 2020

Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models EMNLP 2020

Exposing Shallow Heuristics of Relation Extraction Models with Challenge Data EMNLP 2020

Stable Regression: On the Power of Optimization over Randomization JMLR 2020

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList ACL 2020

Facet-Aware Evaluation for Extractive Summarization ACL 2020

Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates ACL 2020

Are we Estimating or Guesstimating Translation Quality? ACL 2020