← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric ACL 2022

Why only Micro-F1? Class Weighting of Measures for Relation Classification ACL 2022

A global analysis of metrics used for measuring performance in natural language processing ACL 2022

Unmasking the Mask – Evaluating Social Biases in Masked Language Models AAAI 2022

Do Language Models Make Human-like Predictions about the Coreferents of Italian Anaphoric Zero Pronouns? COLING 2022

Generalized Quantifiers as a Source of Error in Multilingual NLU Benchmarks NAACL 2022

Partial-input baselines show that NLI models can ignore context, but they don’t. NAACL 2022

Testing the Ability of Language Models to Interpret Figurative Language NAACL 2022

Benchmarking Intersectional Biases in NLP NAACL 2022

Exposing the Limits of Video-Text Models through Contrast Sets NAACL 2022

Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand NAACL 2022

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models NAACL 2022

Generalization Analysis on Learning with a Concurrent Verifier NIPS 2022

When does dough become a bagel? Analyzing the remaining mistakes on ImageNet NIPS 2022

Beyond Static models and test sets: Benchmarking the potential of pre-trained models across tasks and languages ACL 2022

Pre-trained language models evaluating themselves - A comparative study ACL 2022

On the Limits of Evaluating Embodied Agent Model Generalization Using Validation Sets ACL 2022

First the Worst: Finding Better Gender Translations During Beam Search ACL 2022

AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark ACL 2022

Darkness can not drive out darkness: Investigating Bias in Hate SpeechDetection Models ACL 2022

Rethinking and Refining the Distinct Metric ACL 2022

An Analysis of Negation in Natural Language Understanding Corpora ACL 2022

Data Contamination: From Memorization to Exploitation ACL 2022

SummScreen: A Dataset for Abstractive Screenplay Summarization ACL 2022

ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments ACL 2022