Co-occurring keywords
Papers
VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding
EMNLP 2025
Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments
ACL 2025
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
ACL 2025
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
EMNLP 2025
Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets
EMNLP 2025
Something’s Fishy in the Data Lake: A Critical Re-evaluation of Table Union Search Benchmarks
ACL 2025
WinoWhat: A Parallel Corpus of Paraphrased WinoGrande Sentences with Common Sense Categorization
ACL 2025
When2Call: When (not) to Call Tools
NAACL 2025