Co-occurring keywords
Papers
Multilingual Fact-Checking using LLMs
EMNLP 2024
ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons
NIPS 2024
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
NIPS 2024
Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles
NIPS 2024
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
EMNLP 2024
GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts
NIPS 2024
AMLB: an AutoML Benchmark
JMLR 2024