Co-occurring keywords
Papers
Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models
ACL 2024
NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes
ACL 2024
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
ACL 2024