conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Applications
Natural Language Processing
›
Applications
›
Evaluation
74 papers
Papers per year
2018: 1
1
2020: 1
1
2023: 2
2
2025: 1
1
2026: 69
69
Papers
Can Large Language Models Infer Causal Relationships from Real-World Text?
ACL 2026
LitVISTA: A Benchmark for Narrative Orchestration in Literary Text
ACL 2026
Comparative Analysis of the Intrinsic Metrics for Tokenizers and their effect on Downstream Tasks for Hindi and Marathi
ACL 2026
Automated Creativity Evaluation of Language Models Across Open-Ended Tasks
ACL 2026
Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages
ACL 2026
Bloom-Eval: A Hierarchical Evaluation Benchmark for Automatic Survey Generation Based on Bloom’s Taxonomy
ACL 2026
Culture-Aware Machine Translation in Large Language Models: Benchmarking and Investigation
ACL 2026
HiChunk: Evaluating and Enhancing Retrieval Augmented Generation with Hierarchical Chunking
ACL 2026
Evaluation Pitfalls and Challenges in Multimedia Event Extraction
ACL 2026
Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models
ACL 2026
SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents
ACL 2026
RubricBench: Aligning Model-Generated Rubrics with Human Standards
ACL 2026
Stress Testing Factual Consistency Metrics for Long-Document Summarization
ACL 2026
Discourse Realization of Generics in Human and LLM-generated Texts
ACL 2026
MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation
ACL 2026
ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models
ACL 2026
E2EDev: Benchmarking Large Language Models in End-to-End Software Development Task
ACL 2026
Cross-Examination Framework: A Task-Agnostic Diagnostic for Information Fidelity in Text-to-Text Generation
ACL 2026
GROKE: Vision-Free Navigation Instruction Evaluation via Graph Reasoning on OpenStreetMap
ACL 2026
Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation
ACL 2026
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
ACL 2026
SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation
ACL 2026
Interpretable Coreference Resolution Evaluation Using Explicit Semantics
ACL 2026
When High Accuracy Hides Poor Calibration: Rethinking Confidence Evaluation in Transformer-Based Text Classification with Balanced Brier Score
ACL 2026
TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent
ACL 2026
<
1
2
3
>