conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Applications
Natural Language Processing
›
Applications
›
Evaluation
74 papers
Papers per year
2018: 1
1
2020: 1
1
2023: 2
2
2025: 1
1
2026: 69
69
Papers
OLA: Output Language Alignment in Code-Switched LLM Interactions
ACL 2026
Toward Robust Evaluation for Multilingual Grammatical Error Correction: Can Large Language Models Replace Human References?
ACL 2026
CIG: Measuring Conversational Information Gain in Deliberative Dialogues with Semantic Memory Dynamics
ACL 2026
Macaron: Controlled, Human-Written Benchmark for Multilingual and Multicultural Reasoning via Template-Filling
ACL 2026
Translation or Recitation? Calibrating Evaluation Scores for Machine Translation of Extremely Low-Resource Languages
ACL 2026
Revisiting Evaluation of Question Answering Systems in Low-Resource Indic Languages: Bridging Human and Metric Alignment
ACL 2026
TokCollate: A Comprehensive Tool for Tokenizer Evaluation and Visualization across Languages
ACL 2026
AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge
ACL 2026
TokLens: A Multilingual Lens on Tokenizer Quality for LLMs
ACL 2026
BanglaSocialBench: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction
ACL 2026
Confidence as a Tie-Breaker: Reassessing Multilingual Hedging Bias in LLM-as-a-Judge Evaluation
ACL 2026
BanglaSTEM: A Parallel Corpus and Term-Weighted Evaluation for Technical Bangla-English Translation
ACL 2026
Semantic Span Annotation: An Exploratory Study of LLM Annotation
ACL 2026
Eye Movement Features Can Predict Human Preferences on Machine-Generated Texts
ACL 2026
Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation
ACL 2026
Linguistically-Informed Evaluation of LLMs on Acceptability Judgments in a Forced-Choice Paradigm
ACL 2026
Mind the Gap: Multilingual Divide in LLM Bias Detection and Reasoning
ACL 2026
LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation
ACL 2026
Diagnose, Then Repair: A Two-Stage MQM-Guided Post-Editing Framework for Domain-Specific Machine Translation
ACL 2026
A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability
ACL 2025
HAUSER: Towards Holistic and Automatic Evaluation of Simile Generation
ACL 2023
FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation
EMNLP 2023
SyntaxGym: An Online Platform for Targeted Evaluation of Language Models
ACL 2020
The price of debiasing automatic metrics in natural language evalaution
ACL 2018
<
1
2
3
>