Co-occurring keywords
Papers
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
NIPS 2024
A Large-Scale Human-Centric Benchmark for Referring Expression Comprehension in the LMM Era
NIPS 2024