Co-occurring keywords
Papers
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games
EMNLP 2025
Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments
EMNLP 2025
Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text
EMNLP 2025