Co-occurring keywords
Papers
CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
ICCV 2025
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
ICCV 2025
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study
COLING 2025
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
COLING 2025
A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection
COLING 2025