Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
COHESENTIA: A Novel Benchmark of Incremental versus Holistic Assessment of Coherence in Generated Texts
EMNLP 2023
A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking Systems
EMNLP 2023
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research
EMNLP 2023
Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
EMNLP 2023
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
EMNLP 2023
System Combination via Quality Estimation for Grammatical Error Correction
EMNLP 2023
Accuracy is not enough: Evaluating Personalization in Summarizers
EMNLP 2023
In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages
EMNLP 2023
LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain
EMNLP 2023
Test-time Augmentation for Factual Probing
EMNLP 2023
Exploring the Sensitivity of LLMs’ Decision-Making Capabilities: Insights from Prompt Variations and Hyperparameters
EMNLP 2023
Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis
EMNLP 2023
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization
EMNLP 2023
Information Extraction from Legal Wills: How Well Does GPT-4 Do?
EMNLP 2023
Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution
EMNLP 2023
Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks
EMNLP 2023
DeltaScore: Fine-Grained Story Evaluation with Perturbations
EMNLP 2023
How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey
EMNLP 2023
INVITE: a Testbed of Automatically Generated Invalid Questions to Evaluate Large Language Models for Hallucinations
EMNLP 2023
An Image Quality Assessment Dataset for Portraits
CVPR 2023
False Discovery Proportion control for aggregated Knockoffs
NIPS 2023
Towards Reliable Item Sampling for Recommendation Evaluation
AAAI 2023
Approximating Full Conformal Prediction at Scale via Influence Functions
AAAI 2023
Toward a Perspectivist Turn in Ground Truthing for Predictive Computing
AAAI 2023
A Model-Agnostic Heuristics for Selective Classification
AAAI 2023
<
1
…
42
43
44
…
67
>