conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Applications
Natural Language Processing
›
Applications
›
Evaluation
74 papers
Papers per year
2018: 1
1
2020: 1
1
2023: 2
2
2025: 1
1
2026: 69
69
Papers
Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics
ACL 2026
Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models
ACL 2026
CEDAR: A Chinese Evaluation Dataset for Computational Argumentation
ACL 2026
ROSE: An Intent-Centered Evaluation Metric for NL2SQL
ACL 2026
Repeated Sequences Reveal Gaps between Large Language Models and Natural Language
ACL 2026
Beyond Word Boundaries: A Hebrew Coreference Benchmark and an Evaluation Protocol for Morphologically Complex Text
ACL 2026
Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review
ACL 2026
Reward Modeling for Scientific Writing Evaluation
ACL 2026
Stereotype Bias in a Bilingual Setting: A Culturally Grounded Evaluation in Kazakhstan
ACL 2026
Illusions of the Gold Standard: A Large-scale Analysis of Human Evaluation Protocols for Long-form Text Generation
ACL 2026
Iterative Dual-Model Alignment for Story Evaluation
ACL 2026
ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization
ACL 2026
Evaluating the Impact of Verbal Multiword Expressions on Machine Translation
ACL 2026
BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks
ACL 2026
HAT: Hallucination Annotation for Translation
ACL 2026
Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation
ACL 2026
MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs
ACL 2026
Sigmoid Head for Quality Estimation under Language Ambiguity
ACL 2026
Narrative License and Model Sycophancy in LLM Summaries of Scientific Work
ACL 2026
Label and Explanation Variation in LLM-Based Annotation: a Case Study in Natural Language Inference
ACL 2026
Putting Captions to the Test: Evaluating Video Caption Quality through Multiple-Choice Question Answering
ACL 2026
Subject-level Inference for Realistic Text Anonymization Evaluation
ACL 2026
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
ACL 2026
SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks
ACL 2026
LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases
ACL 2026
<
1
2
3
>