摘 要:协同过滤算法作为最古老的算法有着相当广泛的应用,相似度的计算和最近邻居的选择是该算法的核心。在阐述协同过滤推荐算法的原理和常见相似度计算方法的基础上,提出改进的相似度计算方法,并通过实验验证了不同相似度计算方法在推荐效果方面的差异,分析了如何解决数据稀疏性和平衡项目本身质量权重的问题。实验结果表明,改进的相似度计算方法在准确率、召回率、RMSE、MAE 四个评估指标上都有更好的表现,因此,该方法能够提高推荐质量。
关键词:协同过滤;推荐算法;相似度;最近邻居;数据稀疏性
DOI:10.19850/j.cnki.2096-4706.2022.15.016
基金项目:2021 年广州工商学院质量工程建设项目(ZC20211129)
中图分类号:TP18;TP391 文献标识码:A 文章编号:2096-4706(2022)15-0059-05
Research on Similarity Measure in Collaborative Filtering Recommendation Algorithm
LI Sansan, CHEN Xiaorong
(School of Engineering, Guangzhou College of Technology and Business, Guangzhou 510850, China)
Abstract: As the oldest algorithm, Collaborative Filtering algorithm has a wide range of applications. The calculation of similarity and the selection of nearest neighbor are the core of the algorithm. Based on explaining the principle of Collaborative Filtering recommendation algorithm and common similarity calculation methods, this paper proposes an improved similarity calculation method, and verifies the differences in recommendation effects of different similarity calculation methods through experiments, and analyzes how to solve the problem of data sparsity and balance the quality weight of the project itself. The experimental results show that the improved similarity calculation method performs better on four evaluation indicators: accuracy rate, recall rate, RMSE and MAE. Therefore, the method can improve the recommendation quality.
Keywords: Collaborative Filtering; recommendation algorithm; similarity; nearest neighbor; data sparsity
参考文献:
[1] 莫川川 . 大数据技术在跨境电商领域中的应用研究 [J]. 湖北开放职业学院学报,2020,33(1):120-121+126.
[2] 朱培 . 协同过滤推荐算法及应用 [EB/OL].(2018-05-09)[2022-03-10].https://blog.csdn.net/sdksdk0/article/details/80248999.
[3] 张婷 . 基于近邻协同过滤算法中相似性度量的研究 [D].成都:西南交通大学,2016.
[4] 任看看,钱雪忠 . 协同过滤算法中的用户相似性度量方法研究 [J]. 计算机工程,2015,41(8):18-22+31.
[5] 郑翠翠,李林.协同过滤算法中的相似性度量方法研究 [J].计算机工程与应用,2014,50(8):147-149+206.
[6] 任永功,王思雨,张志鹏,等 . 缓解数据稀疏问题的协同过滤混合填充算法 [J]. 模式识别与人工智能,2020,33(2):166-175.
[7] 王岩,张杰,许合利 . 结合用户兴趣和改进的协同过滤推荐算法 [J]. 小型微型计算机系统,2020,41(8):1665-1669.
作者简介:李散散(1988—),女,汉族,河南新乡人,讲师,硕士研究生,主要研究方向:计算机应用。