Co-occurring keywords
Papers
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
AAAI 2026
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
NAACL 2025
DUTJBD at SemEval-2025 Task 3: A Range of Approaches for Predicting Hallucination Generation in Models
SEMEVAL 2025
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
ACL 2025