Co-occurring keywords
Papers
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
ACL 2025
CULEMO: Cultural Lenses on Emotion - Benchmarking LLMs for Cross-Cultural Emotion Understanding
ACL 2025
Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
ACL 2025
PlanningArena: A Modular Benchmark for Multidimensional Evaluation of Planning and Tool Learning
ACL 2025