Artificial Intelligence › Core AI ›

Agent Systems

3885 directly classified papers

Papers per year

Papers

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models EMNLP 2025

MALLM: Multi-Agent Large Language Models Framework EMNLP 2025

SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks EMNLP 2025

TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents EMNLP 2025

Audio Query Handling System with Integrated Expert Models and Contextual Understanding EMNLP 2025

Generative Reviewer Agents: Scalable Simulacra of Peer Review EMNLP 2025

CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation EMNLP 2025

ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues? EMNLP 2025

ReAct Meets Industrial IoT: Language Agents for Data Access EMNLP 2025

ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions EMNLP 2025

Towards Enforcing Company Policy Adherence in Agentic Workflows EMNLP 2025

Banking Done Right: Redefining Retail Banking with Language-Centric AI EMNLP 2025

Agent vs. Agent: Automated Data Generation and Red-Teaming for Custom Agentic Workflows EMNLP 2025

TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications EMNLP 2025

GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems EMNLP 2025

Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance EMNLP 2025

STREAQ: Selective Tiered Routing for Effective and Affordable Contact Center Quality Assurance EMNLP 2025

Dr. Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian EMNLP 2025

AMAS: Adaptively Determining Communication Topology for LLM-based Multi-agent System EMNLP 2025

AttributeForge: An Agentic LLM Framework for Automated Product Schema Modeling EMNLP 2025

VestaBench: An Embodied Benchmark for Safe Long-Horizon Planning Under Multi-Constraint and Adversarial Settings EMNLP 2025

GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation EMNLP 2025

Recon, Answer, Verify: Agents in Search of Truth EMNLP 2025

STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback EMNLP 2025

LLM-Based Behavior Prediction for Social Media Users with Continuous Memory AACL 2025