conftrace_

← Applications

Natural Language Processing › Applications ›

Evaluation

74 papers

Papers per year

1

1

2

1

69

Papers

Can Large Language Models Infer Causal Relationships from Real-World Text? ACL 2026

LitVISTA: A Benchmark for Narrative Orchestration in Literary Text ACL 2026

Comparative Analysis of the Intrinsic Metrics for Tokenizers and their effect on Downstream Tasks for Hindi and Marathi ACL 2026

Automated Creativity Evaluation of Language Models Across Open-Ended Tasks ACL 2026

Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages ACL 2026

Bloom-Eval: A Hierarchical Evaluation Benchmark for Automatic Survey Generation Based on Bloom’s Taxonomy ACL 2026

Culture-Aware Machine Translation in Large Language Models: Benchmarking and Investigation ACL 2026

HiChunk: Evaluating and Enhancing Retrieval Augmented Generation with Hierarchical Chunking ACL 2026

Evaluation Pitfalls and Challenges in Multimedia Event Extraction ACL 2026

Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models ACL 2026

SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents ACL 2026

RubricBench: Aligning Model-Generated Rubrics with Human Standards ACL 2026

Stress Testing Factual Consistency Metrics for Long-Document Summarization ACL 2026

Discourse Realization of Generics in Human and LLM-generated Texts ACL 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation ACL 2026

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models ACL 2026

E2EDev: Benchmarking Large Language Models in End-to-End Software Development Task ACL 2026

Cross-Examination Framework: A Task-Agnostic Diagnostic for Information Fidelity in Text-to-Text Generation ACL 2026

GROKE: Vision-Free Navigation Instruction Evaluation via Graph Reasoning on OpenStreetMap ACL 2026

Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation ACL 2026

HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation ACL 2026

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation ACL 2026

Interpretable Coreference Resolution Evaluation Using Explicit Semantics ACL 2026

When High Accuracy Hides Poor Calibration: Rethinking Confidence Evaluation in Transformer-Based Text Classification with Balanced Brier Score ACL 2026

TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent ACL 2026