← Learning Types

Machine Learning › Learning Types ›

Evaluation

1654 directly classified papers

Papers per year

Papers

COHESENTIA: A Novel Benchmark of Incremental versus Holistic Assessment of Coherence in Generated Texts EMNLP 2023

A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking Systems EMNLP 2023

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research EMNLP 2023

Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews EMNLP 2023

Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors EMNLP 2023

System Combination via Quality Estimation for Grammatical Error Correction EMNLP 2023

Accuracy is not enough: Evaluating Personalization in Summarizers EMNLP 2023

In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages EMNLP 2023

LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain EMNLP 2023

Test-time Augmentation for Factual Probing EMNLP 2023

Exploring the Sensitivity of LLMs’ Decision-Making Capabilities: Insights from Prompt Variations and Hyperparameters EMNLP 2023

Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis EMNLP 2023

Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization EMNLP 2023

Information Extraction from Legal Wills: How Well Does GPT-4 Do? EMNLP 2023

Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution EMNLP 2023

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks EMNLP 2023

DeltaScore: Fine-Grained Story Evaluation with Perturbations EMNLP 2023

How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey EMNLP 2023

INVITE: a Testbed of Automatically Generated Invalid Questions to Evaluate Large Language Models for Hallucinations EMNLP 2023

An Image Quality Assessment Dataset for Portraits CVPR 2023

False Discovery Proportion control for aggregated Knockoffs NIPS 2023

Towards Reliable Item Sampling for Recommendation Evaluation AAAI 2023

Approximating Full Conformal Prediction at Scale via Influence Functions AAAI 2023

Toward a Perspectivist Turn in Ground Truthing for Predictive Computing AAAI 2023

A Model-Agnostic Heuristics for Selective Classification AAAI 2023