Co-occurring keywords
Papers
Better Call CLAUSE: A Discrepancy Benchmark for Auditing LLMs Legal Reasoning Capabilities
EACL 2026
KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
EACL 2026
FormGym: Doing Paperwork with Agents
EACL 2026
What’s Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning
EACL 2026
MicroEvoEval: A Systematic Evaluation Framework for Image-Based Microstructure Evolution Prediction
AAAI 2026