Co-occurring keywords
Papers
KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
EACL 2026
BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
WACV 2026
HumanBench: Two Heads, No Legs, But Mostly Human, the State of Generative Capabilities in T2I Models
WACV 2026
When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation
EACL 2026