Proceedings of the 2024 International Conference on Artificial Intelligence and Communication (ICAIC 2024)

The Application of Q-Learning in the Prisoner's Dilemma: Achieving Nash Equilibrium in Multi-Agent Systems

Authors
Yuanchang Hao1, *
1Computer Science, Ohio State University, Columbus, USA
*Corresponding author. Email: ychao@iastate.edu
Corresponding Author
Yuanchang Hao
Available Online 23 September 2024.
DOI
10.2991/978-94-6463-512-6_77How to use a DOI?
Keywords
Q-learning; prisoners dilemma; multi-Agent reinforcement learning
Abstract

The Prisoner's Dilemma is a classic example of a game in game theory where two players must decide independently whether to cooperate or betray the other, with the outcome of their choices affecting both their rewards or punishments. This study utilizes the Q-Learning algorithm to solve the problem of achieving Nash equilibrium in the Prisoner's Dilemma using the Parallel-Env environment from the PettingZoo library. Q-Learning allows for multiple random selection of action strategies, providing corresponding rewards and updating Q-values through the Bellman equation. The ε-greedy strategy is used to balance exploration and exploitation, ensuring that the agents sufficiently explore various actions while gradually converging to the optimal strategy. This approach is particularly well-suited for problems like the Prisoner's Dilemma, where limited and definite actions lead to quantifiable rewards. By designing the environment using PettingZoo's Parallel-Env class, all agents are allowed to make decisions simultaneously at each step. In conclusion, this project demonstrates the practicality and efficiency of Q-Learning in solving multi-agent dilemmas and reinforces its applicability in broader multi-agent systems.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2024 International Conference on Artificial Intelligence and Communication (ICAIC 2024)
Series
Advances in Intelligent Systems Research
Publication Date
23 September 2024
ISBN
978-94-6463-512-6
ISSN
1951-6851
DOI
10.2991/978-94-6463-512-6_77How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Yuanchang Hao
PY  - 2024
DA  - 2024/09/23
TI  - The Application of Q-Learning in the Prisoner's Dilemma: Achieving Nash Equilibrium in Multi-Agent Systems
BT  - Proceedings of the 2024 International Conference on Artificial Intelligence and Communication (ICAIC 2024)
PB  - Atlantis Press
SP  - 732
EP  - 738
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-512-6_77
DO  - 10.2991/978-94-6463-512-6_77
ID  - Hao2024
ER  -