Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
When to Use What: An In-Depth Comparative Empirical Analysis of OpenIE Systems for Downstream Applications
ACL 2023
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist
ACL 2023
On Evaluating Multilingual Compositional Generalization with Translated Datasets
ACL 2023
Rogue Scores
ACL 2023
Incorporating Attribution Importance for Improving Faithfulness Metrics
ACL 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
ACL 2023
Comparative evaluation of boundary-relaxed annotation for Entity Linking performance
ACL 2023
READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises
ACL 2023
AlignScore: Evaluating Factual Consistency with A Unified Alignment Function
ACL 2023
Measuring Consistency in Text-based Financial Forecasting Models
ACL 2023
ArgAnalysis35K : A large-scale dataset for Argument Quality Analysis
ACL 2023
Transitioning from benchmarks to a real-world case of information-seeking in Scientific Publications
ACL 2023
Rethinking the Word-level Quality Estimation for Machine Translation from Human Judgement
ACL 2023
This prompt is measuring <mask>: evaluating bias evaluation in language models
ACL 2023
C-XNLI: Croatian Extension of XNLI Dataset
ACL 2023
WYWEB: A NLP Evaluation Benchmark For Classical Chinese
ACL 2023
ANALOGICAL - A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models
ACL 2023
TextVerifier: Robustness Verification for Textual Classifiers with Certifiable Guarantees
ACL 2023
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types
ACL 2023
PragmatiCQA: A Dataset for Pragmatic Question Answering in Conversations
ACL 2023
Scientific Fact-Checking: A Survey of Resources and Approaches
ACL 2023
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models
ACL 2023
Evaluation of Question Generation Needs More References
ACL 2023
Multi-Dimensional Evaluation of Text Summarization with In-Context Learning
ACL 2023
Aligning Offline Metrics and Human Judgments of Value for Code Generation Models
ACL 2023
<
1
…
40
41
42
…
67
>