Policy Gradient Play with Networked Agents in Markov Potential Games

Sarper Aydin; Ceyhun Eksin

2023 L4DC L4DC 2023

Policy Gradient Play with Networked Agents in Markov Potential Games

Abstract

We introduce a distributed policy gradient play algorithm with networked agents playing Markov potential games. Agents have rewards at each stage of the game, that depend on the joint actions of agents given a common dynamic state. Agents implement parameterized and differentiable policies to take actions against each other. Markov potential assumes the existence of potential value functions. In a differentiable Markov potential game, partial gradients of a potential function are equal to the local gradients with respect to the individual parameters. In this work, agents receive information on other agents’ parameters via a communication network in addition to rewards. Agents then use stochastic gradients with respect to local estimates of joint policy parameters to update their policy parameters. We show that agents’ joint policy converges to a first-order stationary point of Markov potential value function with any type of function approximation, state and action spaces. Numerical experiments confirm the convergence result in the lake game, a Markov potential game.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sarper Aydin , Ceyhun Eksin

Topics

Artificial Intelligence > Core AI > Game AI Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Methods > Multi-Agent Systems

Keywords

stochastic gradient policy gradient markov potential game multi-agent system networked agent

Download PDF

Related papers

Model-Based Reinforcement Learning for Cavity Filter Tuning 2023

Learning on Manifolds: Universal Approximations Properties using Geometric Controllability Conditions for Neural ODEs 2023

Hyperparameter Tuning of an Off-Policy Reinforcement Learning Algorithm for H∞ Tracking Control 2023

Policy Learning for Active Target Tracking over Continuous $SE(3)$ Trajectories 2023

Automated Reachability Analysis of Neural Network-Controlled Systems via Adaptive Polytopes 2023