Co-occurring keywords
Papers
Geo3DVQA: Evaluating Vision-Language Models for 3D Geospatial Reasoning from Aerial Imagery
WACV 2026
MAQuA: Multi-outcome Adaptive Question-Asking for Mental Health using Item Response Theory
EACL 2026
ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations
EACL 2026
Pro-QuEST: A Prompt-chain based Quiz Engine for testing Specialized Technical Product Knowledge
EACL 2026
TimeRes: A Turkish Benchmark For Evaluating Temporal Understanding of Large Language Models
EACL 2026