Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
It is a Bird Therefore it is a Robin: On BERT’s Internal Consistency Between Hypernym Knowledge and Logical Words
ACL 2023
ORCA: A Challenging Benchmark for Arabic Language Understanding
ACL 2023
FORK: A Bite-Sized Test Set for Probing Culinary Cultural Biases in Commonsense Reasoning Models
ACL 2023
LMentry: A Language Model Benchmark of Elementary Language Tasks
ACL 2023
Curating Datasets for Better Performance with Example Training Dynamics
ACL 2023
Reproducibility in NLP: What Have We Learned from the Checklist?
ACL 2023
Data Sampling and (In)stability in Machine Translation Evaluation
ACL 2023
A Survey of Evaluation Methods of Generated Medical Textual Reports
ACL 2023
Primacy Effect of ChatGPT
EMNLP 2023
Establishing Trustworthiness: Rethinking Tasks and Model Evaluation
EMNLP 2023
ALCUNA: Large Language Models Meet New Knowledge
EMNLP 2023
Evaluating Cross-Domain Text-to-SQL Models and Benchmarks
EMNLP 2023
Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning
EMNLP 2023
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions
EMNLP 2023
Interview Evaluation: A Novel Approach for Automatic Evaluation of Conversational Question Answering Models
EMNLP 2023
Can Large Language Models Capture Dissenting Human Voices?
EMNLP 2023
Reasoning about Ambiguous Definite Descriptions
EMNLP 2023
Statistically Profiling Biases in Natural Language Reasoning Datasets and Models
EMNLP 2023
Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset
EMNLP 2023
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction
EMNLP 2023
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench
EMNLP 2023
Are Language Models Worse than Humans at Following Prompts? It’s Complicated
EMNLP 2023
How Well Do Text Embedding Models Understand Syntax?
EMNLP 2023
NEWTON: Are Large Language Models Capable of Physical Reasoning?
EMNLP 2023
Emergent Inabilities? Inverse Scaling Over the Course of Pretraining
EMNLP 2023
<
1
…
44
45
46
…
67
>