Co-occurring keywords
Papers
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks
NAACL 2024
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning
NAACL 2024
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
CVPR 2024
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models
ACL 2024
Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis
EMNLP 2024
Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models
EMNLP 2024
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
EMNLP 2024
Needle In A Multimodal Haystack
NIPS 2024
Benchmarking Data Science Agents
ACL 2024