Robust Value Function Approximation Using Bilinear Programming

Marek Petrik; Shlomo Zilberstein

2009 NIPS NeurIPS 2009

Robust Value Function Approximation Using Bilinear Programming

Abstract

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, it provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP hard, but this is unavoidable because the Bellman-residual minimization itself is NP hard. We, therefore, employ and analyze a common approximate algorithm for bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on a simple benchmark problem.

🌉 Interdisciplinary Bridge - Machine Learning and Reinforcement Learning

📈 Trend Setter - Value Iteration

🧭 Keyword Pioneer - bilinear programming

🐝 Cross-Pollinator - Artificial Intelligence, Computer Vision, Data Science & Analytics, Healthcare & Medicine, Machine Learning, Mathematics & Optimization, Reinforcement Learning

🐣 Hot Topic Early Bird - reinforcement learning

Authors

Marek Petrik , Shlomo Zilberstein

Topics

Machine Learning > Core Methods > Representation Learning Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Value Iteration Mathematics & Optimization > Optimization Machine Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning policy iteration value function approximation approximate dynamic programming bellman residual minimization bilinear programming approximate policy iteration robust approximation bellman residual

Download PDF

Related papers

Solving Stochastic Games 2009

Bilinear classifiers for visual recognition 2009

Zero-shot Learning with Semantic Output Codes 2009

Matrix Completion from Power-Law Distributed Samples 2009

Heavy-Tailed Symmetric Stochastic Neighbor Embedding 2009