The Application of Q-Learning in the Prisoner's Dilemma: Achieving Nash Equilibrium in Multi-Agent Systems
- DOI
- 10.2991/978-94-6463-512-6_77How to use a DOI?
- Keywords
- Q-learning; prisoners dilemma; multi-Agent reinforcement learning
- Abstract
The Prisoner's Dilemma is a classic example of a game in game theory where two players must decide independently whether to cooperate or betray the other, with the outcome of their choices affecting both their rewards or punishments. This study utilizes the Q-Learning algorithm to solve the problem of achieving Nash equilibrium in the Prisoner's Dilemma using the Parallel-Env environment from the PettingZoo library. Q-Learning allows for multiple random selection of action strategies, providing corresponding rewards and updating Q-values through the Bellman equation. The ε-greedy strategy is used to balance exploration and exploitation, ensuring that the agents sufficiently explore various actions while gradually converging to the optimal strategy. This approach is particularly well-suited for problems like the Prisoner's Dilemma, where limited and definite actions lead to quantifiable rewards. By designing the environment using PettingZoo's Parallel-Env class, all agents are allowed to make decisions simultaneously at each step. In conclusion, this project demonstrates the practicality and efficiency of Q-Learning in solving multi-agent dilemmas and reinforces its applicability in broader multi-agent systems.
- Copyright
- © 2024 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Yuanchang Hao PY - 2024 DA - 2024/09/23 TI - The Application of Q-Learning in the Prisoner's Dilemma: Achieving Nash Equilibrium in Multi-Agent Systems BT - Proceedings of the 2024 International Conference on Artificial Intelligence and Communication (ICAIC 2024) PB - Atlantis Press SP - 732 EP - 738 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-512-6_77 DO - 10.2991/978-94-6463-512-6_77 ID - Hao2024 ER -