Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
Digging Errors in NMT: Evaluating and Understanding Model Errors from Partial Hypothesis Space
EMNLP 2022
Estimating Example Difficulty Using Variance of Gradients
CVPR 2022
Measuring Compositional Consistency for Video Question Answering
CVPR 2022
Towards Driving-Oriented Metric for Lane Detection Models
CVPR 2022
Do Explanations Explain? Model Knows Best
CVPR 2022
OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization
CVPR 2022
Texture-Based Error Analysis for Image Super-Resolution
CVPR 2022
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems
AACL 2022
IM2: an Interpretable and Multi-category Integrated Metric Framework for Automatic Dialogue Evaluation
EMNLP 2022
Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency
EMNLP 2022
Analyzing and Evaluating Faithfulness in Dialogue Summarization
EMNLP 2022
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation
EMNLP 2022
A Multifaceted Framework to Evaluate Evasion, Content Preservation, and Misattribution in Authorship Obfuscation Techniques
EMNLP 2022
Towards a Unified Multi-Dimensional Evaluator for Text Generation
EMNLP 2022
EvEntS ReaLM: Event Reasoning of Entity States via Language Models
EMNLP 2022
Geographic Citation Gaps in NLP Research
EMNLP 2022
How Large Language Models are Transforming Machine-Paraphrase Plagiarism
EMNLP 2022
QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance
EMNLP 2022
Exploring the Effects of Negation and Grammatical Tense on Bias Probes
AACL 2022
YASO: A Targeted Sentiment Analysis Evaluation Dataset for Open-Domain Reviews
EMNLP 2021
Chinese WPLC: A Chinese Dataset for Evaluating Pretrained Language Models on Word Prediction Given Long-Range Context
EMNLP 2021
Finding a Balanced Degree of Automation for Summary Evaluation
EMNLP 2021
How much coffee was consumed during EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI
EMNLP 2021
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
EMNLP 2021
Proxy Indicators for the Quality of Open-domain Dialogues
EMNLP 2021
<
1
…
53
54
55
…
67
>