Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Optimization & Theory
Machine Learning
›
Optimization & Theory
›
Evaluation
515 directly classified papers
Papers per year
2003: 1
2004: 1
2005: 1
2006: 1
2008: 2
2009: 1
2010: 1
2013: 5
2016: 3
2017: 8
2018: 11
2019: 24
2020: 25
2021: 34
2022: 68
2023: 74
2024: 105
2025: 147
2026: 3
Papers
Systematic Assessment of Factual Knowledge in Large Language Models
EMNLP 2023
Re-Examining Summarization Evaluation across Multiple Quality Criteria
EMNLP 2023
DiQAD: A Benchmark Dataset for Open-domain Dialogue Quality Assessment
EMNLP 2023
Exploring Context-Aware Evaluation Metrics for Machine Translation
EMNLP 2023
FFAEval: Evaluating Dialogue System via Free-For-All Ranking
EMNLP 2023
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
EMNLP 2023
Post Turing: Mapping the landscape of LLM Evaluation
EMNLP 2023
Analyzing Multi-Sentence Aggregation in Abstractive Summarization via the Shapley Value
EMNLP 2023
A Data-Based Perspective on Transfer Learning
CVPR 2023
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
CVPR 2023
Beyond Confidence: Reliable Models Should Also Consider Atypicality
NIPS 2023
Discriminative Calibration: Check Bayesian Computation from Simulations and Flexible Classifier
NIPS 2023
Counterfactually Comparing Abstaining Classifiers
NIPS 2023
Vote’n’Rank: Revision of Benchmarking with Social Choice Theory
EACL 2023
Investigating UD Treebanks via Dataset Difficulty Measures
EACL 2023
MTEB: Massive Text Embedding Benchmark
EACL 2023
WRF: Weighted Rouge-F1 Metric for Entity Recognition
IJCNLP 2023
A Statistical Learning Take on the Concordance Index for Survival Analysis
AISTATS 2023
Precision Recall Cover: A Method For Assessing Generative Models
AISTATS 2023
Optimizing ROC Curves with a Sort-Based Surrogate Loss for Binary Classification and Changepoint Detection
JMLR 2023
Comprehensive Algorithm Portfolio Evaluation using Item Response Theory
JMLR 2023
Statistical Comparisons of Classifiers by Generalized Stochastic Dominance
JMLR 2023
Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across Languages
IJCNLP 2023
System Identification of Neural Systems: If We Got It Right, Would We Know?
ICML 2023
Sampling-Based Accuracy Testing of Posterior Estimators for General Inference
ICML 2023
<
1
…
11
12
13
…
21
>