Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)

Bound Action Policy for Better Sample Efficiency

Authors
Junning Huang, Zhifeng Hao
Corresponding Author
Junning Huang
Available Online May 2018.
DOI
10.2991/ncce-18.2018.131How to use a DOI?
Keywords
Reinforcement; policy gradient; action output.; locomotion policy; Gaussian distribution.
Abstract

Reinforcement learning algorithm for solving robotic locomotion control problem has achieved great progress. Use a Gaussian distribution to represent the locomotion policy of the robot is a general way. A locomotion policy means the distribution of action output. However, in real-world control problems, the actions are bounded by physical constraints, which introduces a bias when Gaussian distribution is used as the policy. This paper proposes logistic gaussian policy, can reduce both the bias introducing by Gaussian distribution and the variance between policy gradient samples.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)
Series
Advances in Intelligent Systems Research
Publication Date
May 2018
ISBN
10.2991/ncce-18.2018.131
ISSN
1951-6851
DOI
10.2991/ncce-18.2018.131How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Junning Huang
AU  - Zhifeng Hao
PY  - 2018/05
DA  - 2018/05
TI  - Bound Action Policy for Better Sample Efficiency
BT  - Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)
PB  - Atlantis Press
SP  - 794
EP  - 799
SN  - 1951-6851
UR  - https://doi.org/10.2991/ncce-18.2018.131
DO  - 10.2991/ncce-18.2018.131
ID  - Huang2018/05
ER  -