Approximate discounting-free policy evaluation from transient and recurrent states

Dewanto, Vektor; Gallagher, Marcus

Computer Science > Machine Learning

arXiv:2204.04324 (cs)

[Submitted on 8 Apr 2022]

Title:Approximate discounting-free policy evaluation from transient and recurrent states

Authors:Vektor Dewanto, Marcus Gallagher

View PDF

Abstract:In order to distinguish policies that prescribe good from bad actions in transient states, we need to evaluate the so-called bias of a policy from transient states. However, we observe that most (if not all) works in approximate discounting-free policy evaluation thus far are developed for estimating the bias solely from recurrent states. We therefore propose a system of approximators for the bias (specifically, its relative value) from transient and recurrent states. Its key ingredient is a seminorm LSTD (least-squares temporal difference), for which we derive its minimizer expression that enables approximation by sampling required in model-free reinforcement learning. This seminorm LSTD also facilitates the formulation of a general unifying procedure for LSTD-based policy value approximators. Experimental results validate the effectiveness of our proposed method.

Comments:	28 pages
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2204.04324 [cs.LG]
	(or arXiv:2204.04324v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2204.04324

Submission history

From: Vektor Dewanto [view email]
[v1] Fri, 8 Apr 2022 23:24:13 UTC (74 KB)

Computer Science > Machine Learning

Title:Approximate discounting-free policy evaluation from transient and recurrent states

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Approximate discounting-free policy evaluation from transient and recurrent states

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators