Co-occurring keywords
Papers
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
NAACL 2025
Chain of Knowledge Graph: Information-Preserving Multi-Document Summarization for Noisy Documents
COLING 2025
ArabicSense: A Benchmark for Evaluating Commonsense Reasoning in Arabic with Large Language Models
COLING 2025
The Geometry of Creative Variability: How Credal Sets Expose Calibration Gaps in Language Models
EMNLP 2025