Co-occurring keywords
Papers
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in Large Language Models
ACL 2025
LLM Agents Making Agent Tools
ACL 2025
Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
IJCAI 2025
SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation
ACL 2025
ReasoningWeekly: A General Knowledge and Verbal Reasoning Challenge for Large Language Models
AACL 2025