摘 要:针对客户流失在电信中检测率低的问题,文章提出了一种改进粒子群的随机森林模型。首先对数据的每个属性进行分析,选取合适的特征,再用 SMOTE 技术处理数据不均匀问题,然后运用决策树、随机森林、支持向量机等监督算法得出其中最优模型,其中随机森林算法最优,最后用改进 PSO 算法中的惯性权重和学习因子优化随机森林的参数。经实验验证该模型比随机森林和粒子群优化后的随机森林数据要高,准确率高达 91%,召回率高达 95%。
关键词:客户流失;随机森林;粒子群算法
DOI:10.19850/j.cnki.2096-4706.2021.22.022
中图分类号:TP18 文献标识码:A 文章编号:2096-4706(2021)22-0075-04
Research on Customer Churn Prediction of Random Forest Optimization Algorithm Based on Improved Particle Swarm Optimization
ZHANG Sanniu, ZHANG Zhibin
(Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China)
Abstract: Aiming at the low detection rate of customer churn in telecommunications, this paper proposes a random forest model based on the improved particle swarm optimization. Firstly, each attribute of the data is analyzed, the appropriate characteristics are selected, and SMOTE technology is used to deal with the problem of uneven data. Then, the optimal model is obtained by using the supervision algorithms such as decision tree, random forest and support vector machine, in which the random forest algorithm is the best. Finally, the parameters of random forest are optimized by using the inertia weight and learning factor in the improved PSO algorithm. Experiment results show that the effect of the model is higher than random forest and random forest optimized by improved particle swarm optimization, the accuracy is up to 91%, and the recall rate reaches 95%.
Keywords: customer churn; random forest; particle swarm optimization
参考文献:
[1] 周荣鑫,赵娟娟,靳梦华 . 基于贝叶斯网络的电信客户流失预测分析 [J]. 软件,2019,40(2):187-190.
[2] 时丹蕾,杜宝军 . 基于 BP 神经网络的银行客户流失预测[J]. 科学技术创新,2021(27):104-106.
[3] HU X,YANG Y F,CHEN L H,et al. Research on a Customer Churn Combination Prediction Model Based on Decision Tree and Neural Network [C]//2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics(ICCCBDA).Chengdu: IEEE,2020:129-132.
[4] 肖进,李思涵,贺小舟,等 . 代价敏感的客户流失预测半监督集成模型研究 [J]. 系统工程理论与实践,2021,41(1): 188-199.
[5] 王泽宇 . 基于粒子群优化算法的电信客户流失预测模型的设计与实现 [D]. 北京:中国科学院大学,2019.
[6] 王文博,曾小梅,赵引川,等 . 基于 SMOTE-XGBoost 的变压器缺陷预测 [J]. 华北电力大学学报(自然科学版),2021, 48(5):54-60+71.
[7] 丁敬国,郭锦华 . 基于主成分分析协同随机森林算法的热连轧带钢宽度预测 [J]. 东北大学学报(自然科学版),2021,42(9): 1268-1274+1289.
[8] 杨泽民 . 基于 PSO 的电信业数据关联规则挖掘 [J]. 软件,2013,34(6):44-46.
[9] XUE Z H,DU P J,SU H J. Harmonic Analysis for Hyperspectral Image Classification Integrated With PSO Optimized SVM [J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2014,7(6):2131-2146.
[10] BHUSE P, GANDHI A, MESWANI P, et al. Machine Learning Based Telecom-Customer Churn Prediction [C]// 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi:IEEE,2020,1297-1301.
作者简介:张三妞(1997—),女,汉族,河南周口人,硕士在读生,主要研究方向:大数据;张智斌(1965—),男,汉族, 四川会理人,副教授,学士,主要研究方向:基于网络的计算机软件技术、工业控制技术。