conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Evaluation
393 papers
Papers per year
2021: 2
2
2022: 2
2
2023: 1
1
2024: 3
3
2025: 2
2
2026: 383
383
Papers
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
ACL 2026
The Side Effects of Being Smart: Safety Risks in MLLMs’ Multi-Image Reasoning
ACL 2026
Action Boundary Blindness: When LLM Agents Cannot Tell Where One Action Ends and Another Begins
ACL 2026
IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation
ACL 2026
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
ACL 2026
MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks
ACL 2026
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents
ACL 2026
Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation
ACL 2026
AlignCultura: Towards Culturally Aligned Large Language Models?
ACL 2026
TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice
ACL 2026
Tears or Cheers? Benchmarking LLMs via Culturally Elicited Distinct Affective Responses
ACL 2026
DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection
ACL 2026
HumanLLM: Benchmarking and Improving LLM Anthropomorphism via Human Cognitive Patterns
ACL 2026
v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound
ACL 2026
SciCoQA: Quality Assurance for Scientific Paper–Code Alignment
ACL 2026
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models
ACL 2026
ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering
ACL 2026
Putting HUMANS first: Efficient LAM Evaluation with Human Preference Alignment
ACL 2026
Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models
ACL 2026
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
ACL 2026
Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition
ACL 2026
GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs
ACL 2026
Assessing the Belief Consistency of Large Language Models on the Logical Conversation Process
ACL 2026
Learning Uncertainty from Sequential Internal Dispersion in Large Language Models
ACL 2026
Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study
ACL 2026
<
1
…
10
11
12
…
16
>