当前位置>主页 > 期刊在线 > 计算机技术 >

计算机技术21年17期

计算机博弈算法在黑白棋中的应用
彭之军
(广东邮电职业技术学院,广东 广州 510630)

摘  要:计算机博弈是人工智能的重要分支之一,文章对人工智能算法黑白棋中的应用进行了研究。首先介绍了计算机博弈中的经典黑白棋算法,然后介绍深度强化学习中两种典型的时间差分算法的定义和实现过程,以及两者的区别和联系。最后评测蒙特卡洛树搜索算法、Q 学习算法和 SARSA 算法三种算法在黑白棋实际应用的表现,以及后续改进的方向。


关键词:蒙特卡洛树搜索;深度强化学习;马尔科夫决策过程;Q 学习;SARSA



DOI:10.19850/j.cnki.2096-4706.2021.17.018


基金项目:2019 年广东省教育厅普通高校 特色创新类项目(2019GKTSCX059)


中图分类号:TP181                                         文献标识码:A                                       文章编号:2096-4706(2021)17-0073-06


Application of Computer Game Algorithm in Black and White Chess

PENG Zhijun

(Guangdong Vocational College of Post and Telecom, Guangzhou 510630, China)

Abstract: Computer game is one of the important branches of artificial intelligence. This paper studies the application of artificial intelligence algorithm in black and white chess. This paper first introduces the classical black and white chess algorithm in computer game, and then introduces the definition and implementation process of two typical time difference algorithms in deep reinforcement learning, as well as their differences and relations. Finally, evaluate the performance of MCTS algorithm, Q learning algorithm and SARSA algorithm in the practical application of black and white chess, as well as the direction of subsequent improvement.

Keywords: MCTS; deep reinforcement learning; Markov decision process; Q learning; SARSA


参考文献:

[1] 徐心和,邓志立,王骄,等 . 机器博弈研究面临的各种挑战 [J]. 智能系统学报,2008(4):288-293.

[2] 帕佩拉,费格森 . 深度学习与围棋 [M]. 赵普明,译 . 北京:人民邮电出版社,2021.

[3] LEE K F,MAHAJAN S.The development of a world class Othello program [J].Artificial Intelligence,1990,43(1):21-36.

[4] ALLIS L V.Searching for Solutions in Games and Artificial Intelligence [D].Maastricht:University of Limburg,1994.

[5] 伊庭齐志著 . 曹旸,译 .AI 游戏开发和深度学习进阶 [M]. 北京:机械工业出版社,2021.

[6] LAZARD E. 黑白棋战术指南 [M].Paris:FEDERATION FRANCAISE D'OTHELLO,1993.

[7] SCHAEFFER J,HLYNKA M,JUSSILA V. Temporal Difference Learning Applied to a High-Performance Game-Playing Program [J].Theoretical Computer Science,2001,252(1-2):105-119.

[8] 纪嘉伟 . 基于蒙特卡洛树搜索的混合激活函数研究 [D]. 兰州:兰州大学,2021.

[9] WANG Y ,GELLY S .Modifications of UCT and sequencelike simulations for Monte-Carlo Go [C]//IEEE Symposium on Computational Intelligence & Games.Honolulu:IEEE,2007.

[10] 刘全,翟建伟,章宗长,等 . 深度强化学习综述 [J]. 计算机学报,2018,41(1):1-27.

[11] WIERING M,OTTERLO M V. 强化学习 [M]. 赵地,等译 . 北京:机械工业出版社,2018.

[12] RAVICHANDIRAN S.Python 强化学习实战 [M]. 连晓峰,等译 . 北京:机械工业出版社,2019.

[13] 唐振韬,邵坤,赵冬斌,等 . 深度强化学习进展:从 AlphaGo 到 AlphaGo Zero [J]. 控制理论与应用,2017,34(12): 1529-1546.


作者简介:彭之军(1978—),男,汉族,湖北潜江人,高级工程师,硕士,研究方向:软件工程、人工智能技术、移动互联网技术。