Co-occurring keywords
Papers
Do VSR Models Generalize Beyond LRS3?
WACV 2024
TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
ACL 2024
CK12: A Rounded K12 Knowledge Graph Based Benchmark for Chinese Holistic Cognition Evaluation
AAAI 2024
Detection, Diagnosis, and Explanation: A Benchmark for Chinese Medical Hallucination Evaluation
COLING 2024
EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation
COLING 2024
NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing
ACL 2023
On Pitfalls of Test-Time Adaptation
ICML 2023