Artificial Intelligence › Core AI ›

Agent Systems

3885 directly classified papers

Papers per year

Papers

Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling ACL 2025

TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning ACL 2025

Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub ACL 2025

Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents ACL 2025

SceneGenAgent: Precise Industrial Scene Generation with Coding Agent ACL 2025

ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models ACL 2025

When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation ACL 2025

ExploraCoder: Advancing Code Generation for Multiple Unseen APIs via Planning and Chained Exploration ACL 2025

CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision ACL 2025

Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning ACL 2025

nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow ACL 2025

Interactive and Expressive Code-Augmented Planning with Large Language Models ACL 2025

Multi-Attribute Steering of Language Models via Targeted Intervention ACL 2025

AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations ACL 2025

CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration ACL 2025

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models ACL 2025

ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution ACL 2025

Analytically Tractable Models for Decision Making under Present Bias AAAI 2024

Seamless Human Motion Composition with Blended Positional Encodings CVPR 2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model CVPR 2024

Towards Learning a Generalist Model for Embodied Navigation CVPR 2024

WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games NIPS 2024

Humanoid Locomotion as Next Token Prediction NIPS 2024

OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion CVPR 2024

CogAgent: A Visual Language Model for GUI Agents CVPR 2024