Bound Action Policy for Better Sample Efficiency

Junning Huang; Zhifeng Hao

doi:10.2991/ncce-18.2018.131

<Previous Article In Volume

Next Article In Volume>

Bound Action Policy for Better Sample Efficiency

Authors

Junning Huang, Zhifeng Hao

Corresponding Author

Junning Huang

Available Online May 2018.

DOI: 10.2991/ncce-18.2018.131 How to use a DOI?
Keywords: Reinforcement; policy gradient; action output.; locomotion policy; Gaussian distribution.
Abstract: Reinforcement learning algorithm for solving robotic locomotion control problem has achieved great progress. Use a Gaussian distribution to represent the locomotion policy of the robot is a general way. A locomotion policy means the distribution of action output. However, in real-world control problems, the actions are bounded by physical constraints, which introduces a bias when Gaussian distribution is used as the policy. This paper proposes logistic gaussian policy, can reduce both the bias introducing by Gaussian distribution and the variance between policy gradient samples.
Copyright: © 2018, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)
Series: Advances in Intelligent Systems Research
Publication Date: May 2018
ISBN: 978-94-6252-517-7
ISSN: 1951-6851
DOI: 10.2991/ncce-18.2018.131 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Junning Huang
AU  - Zhifeng Hao
PY  - 2018/05
DA  - 2018/05
TI  - Bound Action Policy for Better Sample Efficiency
BT  - Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)
PB  - Atlantis Press
SP  - 794
EP  - 799
SN  - 1951-6851
UR  - https://doi.org/10.2991/ncce-18.2018.131
DO  - 10.2991/ncce-18.2018.131
ID  - Huang2018/05
ER  -

download .riscopy to clipboard