← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

It is a Bird Therefore it is a Robin: On BERT’s Internal Consistency Between Hypernym Knowledge and Logical Words ACL 2023

ORCA: A Challenging Benchmark for Arabic Language Understanding ACL 2023

FORK: A Bite-Sized Test Set for Probing Culinary Cultural Biases in Commonsense Reasoning Models ACL 2023

LMentry: A Language Model Benchmark of Elementary Language Tasks ACL 2023

Curating Datasets for Better Performance with Example Training Dynamics ACL 2023

Reproducibility in NLP: What Have We Learned from the Checklist? ACL 2023

Data Sampling and (In)stability in Machine Translation Evaluation ACL 2023

A Survey of Evaluation Methods of Generated Medical Textual Reports ACL 2023

Primacy Effect of ChatGPT EMNLP 2023

Establishing Trustworthiness: Rethinking Tasks and Model Evaluation EMNLP 2023

ALCUNA: Large Language Models Meet New Knowledge EMNLP 2023

Evaluating Cross-Domain Text-to-SQL Models and Benchmarks EMNLP 2023

Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning EMNLP 2023

The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions EMNLP 2023

Interview Evaluation: A Novel Approach for Automatic Evaluation of Conversational Question Answering Models EMNLP 2023

Can Large Language Models Capture Dissenting Human Voices? EMNLP 2023

Reasoning about Ambiguous Definite Descriptions EMNLP 2023

Statistically Profiling Biases in Natural Language Reasoning Datasets and Models EMNLP 2023

Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset EMNLP 2023

A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction EMNLP 2023

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench EMNLP 2023

Are Language Models Worse than Humans at Following Prompts? It’s Complicated EMNLP 2023

How Well Do Text Embedding Models Understand Syntax? EMNLP 2023

NEWTON: Are Large Language Models Capable of Physical Reasoning? EMNLP 2023

Emergent Inabilities? Inverse Scaling Over the Course of Pretraining EMNLP 2023