conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Evaluation
393 papers
Papers per year
2021: 2
2
2022: 2
2
2023: 1
1
2024: 3
3
2025: 2
2
2026: 383
383
Papers
CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents
ACL 2026
Are Large Language Models Economically Viable for Industry Deployment?
ACL 2026
Efficient Agent Evaluation via Diversity-Guided User Simulation
ACL 2026
What Question Did You Answer? Refining Contact Center Evaluation Plans via Backward Questions
ACL 2026
Development and Benchmarking of a Blended Human-AI Qualitative Research Assistant
ACL 2026
Measuring and Mitigating Racial Bias in Embedding Models: A Comparative Study for Law Enforcement Retrieval
ACL 2026
LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control
ACL 2026
The LUMirage: An independent evaluation of zero-shot performance in the LUMIR challenge
MIDL 2026
Towards a Principled Evaluation of Knowledge Editors
ACL 2025
Evaluating Text Style Transfer Evaluation: Are There Any Reliable Metrics?
NAACL 2025
ToMBench: Benchmarking Theory of Mind in Large Language Models
ACL 2024
BenchIE^FL: A Manually Re-Annotated Fact-Based Open Information Extraction Benchmark
ACL 2024
HelloFresh: LLM Evalutions on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
ACL 2024
The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation
ACL 2023
ADBench: Anomaly Detection Benchmark
NIPS 2022
BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation
ACL 2022
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
EMNLP 2021
TabPert : An Effective Platform for Tabular Perturbation
EMNLP 2021
<
1
…
12
13
14
15
16
>