conftrace_

Artificial Intelligence › Core AI ›

Evaluation

393 papers

Papers per year

2

2

1

3

2

383

Papers

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems ACL 2026

The Side Effects of Being Smart: Safety Risks in MLLMs’ Multi-Image Reasoning ACL 2026

Action Boundary Blindness: When LLM Agents Cannot Tell Where One Action Ends and Another Begins ACL 2026

IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation ACL 2026

VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning ACL 2026

MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks ACL 2026

Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents ACL 2026

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation ACL 2026

AlignCultura: Towards Culturally Aligned Large Language Models? ACL 2026

TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice ACL 2026

Tears or Cheers? Benchmarking LLMs via Culturally Elicited Distinct Affective Responses ACL 2026

DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection ACL 2026

HumanLLM: Benchmarking and Improving LLM Anthropomorphism via Human Cognitive Patterns ACL 2026

v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound ACL 2026

SciCoQA: Quality Assurance for Scientific Paper–Code Alignment ACL 2026

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models ACL 2026

ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering ACL 2026

Putting HUMANS first: Efficient LAM Evaluation with Human Preference Alignment ACL 2026

Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models ACL 2026

Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks ACL 2026

Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition ACL 2026

GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs ACL 2026

Assessing the Belief Consistency of Large Language Models on the Logical Conversation Process ACL 2026

Learning Uncertainty from Sequential Internal Dispersion in Large Language Models ACL 2026

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study ACL 2026