摘 要:协同过滤算法被大规模用于推荐系统中,随着信息技术进入一个新的高度,传统的推荐技术在提高用户的推荐准确率和推荐时效性方面均存在缺陷。针对这两个问题,提出了一种基于用户行为概率的协同过滤方法。根据 Spark 可以将中间结果存于内存的优点,将该方法基于 Spark 框架进行并行化设计,实现分布式计算。算法采用 Movie Lens 等数据集进行测试。结果表明,分布式设计的方案在准确率和效率方面均有提升。
关键词:推荐系统;协同过滤;Spark;用户行为
DOI:10.19850/j.cnki.2096-4706.2022.19.015
中图分类号:TP311 文献标识码:A 文章编号:2096-4706(2022)19-0061-04
Research on Parallel Algorithm of Collaborative Filtering Based on Spark
MENG Xiangyu, QIU Liang
(Heilongjiang University, Harbin 150080, China )
Abstract: Collaborative filtering algorithms are used in recommendation systems on a large scale. With the information technology entering a new height, traditional recommendation techniques are deficient in improving user recommendation accuracy and recommendation timeliness. In view of these two problems, a method of collaborative filtering based on probability of user behavior is proposed. According to the advantages that Spark can save the intermediate results in memory, the method is parallelized for design based on the Spark framework to realize the distributed computing. The algorithm uses Movie Lens and other datasets to test. The results show that the scheme of distributed design has improved in terms of accuracy and efficiency.
Keywords: recommender system; collaborative filtering; Spark; user behavior
参考文献:
[1 ] GEORGE G,OSINGA E C,LAVIE D,et al. Big data and data science methods for management research [J].The Academy of Management Journal,2016,59(5):1493-1507.
[2 ] 石磊,李树青 . 基于用户时点可见性的无趣项挖掘及协同过滤推荐方法 [J]. 数据分析与知识发现,2022,6(5):64-76.
[3 ] SU Z,LIN Z Y,AI J,et al. Rating Prediction in Recommender Systems Based on User Behavior Probability and Complex Network Modeling [J].IEEE Access,2021,9:30739-30749.
[4 ] 梁彦 . 基于分布式平台 Spark 和 YARN 的数据挖掘算法的并行化研究 [D]. 广州:中山大学,2014.
[5 ] SHVACHKO K,KUANG H,RADIA S,et al. The Hadoop Distributed File System [EB/OL].[2022-06-17].https://www.docin.com/ p-1725911425.html.
[6 ] 樊艳清,梁宏宇,纪佳琪 . 协同过滤算法中相似度计算问题研究 [J]. 计算机技术与发展,2020,30(8):91-96.
[7 ] VINCENT D BJEAN L G,RENAUD L,et al. Fast unfolding of communities in large networks [J/OL].Journal of Statistical Mechanics:Theory and Experiment,2008(10)[2022-06-19].https://www.mendeley.com/catalogue/d8ac3ddb-af95-39a5-80eb-83d9baf48d9b/.
[8 ] SU Z,ZHENG X L,AI J,et al. Link prediction in recommender systems with confidence measures [J/OL].Chaos, 2019,29(8):083133[2022-06-19].https://aip.scitation.org/ doi/10.1063/1.5099565.
[9 ] HARPER F M,KONSTAN J A. The MovieLens Datasets: History and Context [J].ACM Transactions on Interactive Intelligent Systems 2016,5(4):1-19.
[1 0] JURE LESKOVEC. Standord Large Network Dataset Collection [EB/OL].[2022-06-19].http://snap.stanford.edu/data/.
作者简介:孟祥宇(1998—),男,汉族,山东菏泽人,硕士研究生在读,研究方向:数据挖掘、推荐算法;邱亮(1978—),男,汉族,河北涿鹿人,科级副高,本科,研究方向:经济管理。