摘 要:为了实现保险场景的精准营销,同时充分利用千万级客户和保单历史成交记录的数据特点,本文经热门算法研究和统计理论分析,提出一种基于 XGBoost 改造的 Deep Forest 级联算法。该算法采用 XGBoost 浅层机器学习算法作为Deep Forest 级联构建块,同时用 AUC-PR 标准作为级联构建深度学习不平衡样本评价的自适应过程,并将此算法分别与原有XGBoost 算法和原始 Deep Forest 算法进行性能比较。经实践,上述算法应用投产于保险购买预测场景中,分别比原有 XGBoost算法和原 Deep Forest 算法提高 5.5% 和 2.8%,效果显著;同时提出的浅层学习向基于 Deep Forest 深度优化操作流程,也为其他类似应用场景提供了实践参考方向。
关键词:Deep Forest;XGBoost;深度学习;保险精准营销
中图分类号:TP301.6 文献标识码:A 文章编号:2096-4706(2019)22-0116-07
Application of an Improved Deep Forest Algorithm in Insurance Purchase Prediction Scenario
LIN Pengcheng,TANG Hui
(Research and Development Center of China Life Insurance (Group) Company,Beijing 100033,China)
Abstract:In order to realize the precise marketing of the insurance scenario,and make full use of the data characteristics of tens of millions of customers and the historical transaction records of insurance policies,this paper proposes a Deep Forest cascade algorithm based on XGBoost transformation through popular algorithm research and statistical theory analysis. This algorithm adopts XGBoost shallow machine learning algorithm as the building block of Deep Forest cascade,and uses AUC-PR standard as the adaptive process of cascading deep learning unbalanced sample evaluation,and compares the performance of this algorithm with the original XGBoost algorithm and the original Deep Forest algorithm respectively. Practice has proved that the above algorithm applied in the prediction scenario of insurance purchase is improved by 5.5% and 2.8%,respectively,compared with the original XGBoost algorithm and the original Deep Forest algorithm. At the same time,the proposed shallow learning direction based on Deep Forest depth optimization operation process also provides practical reference for other similar application scenarios.
Keywords:Deep Forest;XGBoost;deep learning;insurance precision marketing
参考文献:
[1] 田敏,李纯青,李雪萍 . 需求成熟度模型的商业银行零售客户交叉购买行为预测研究 [J]. 西安工业大学学报,2013,33(5):392-397.
[2] 黄聪,王东 . 基于 RFM 分析模式与马尔可夫链的客户行为预测模型研究 [J]. 情报杂志,2009,28(S2):143-146+69.
[3] 祝歆,刘潇蔓,陈树广,等 . 基于机器学习融合算法的网络购买行为预测研究 [J]. 统计与信息论坛,2017,32(12):94-100.
[4] 李栋,张文宇 . 基于 FOA-ELM 的客户基金购买行为预测仿真 [J]. 计算机仿真,2014,31(6):233-237.
[5] 吴玉锋 . 社会阶层、社会资本与我国城乡居民商业保险购买行为——基于 CGSS2015 的调查数据 [J]. 中国软科学,2018(6):56-66.
[6] 王垒 . 互联网人身保险购买意愿研究 [D]. 杭州:浙江财经大学,2016.
[7] 赖春燕 . 数据挖掘在我国家庭保险购买行为分析上的应用 [D]. 哈尔滨:哈尔滨工业大学,2017.
[8] MURTHYSK.AutomaticConstructionofDecisionTreesfromData:AMulti-DisciplinarySurvey [J].DataMiningandKnowledgeDiscovery,1998,2(4):345-389.
[9] 孙志军,薛磊,许阳明,等 . 深度学习研究综述 [J]. 计算机应用研究,2012,29(8):2806-2810.
[10] ZHOU Z H,FENG J. Deep Forest:Towards an Alternative to Deep Neural Networks [C]//IJCAI-17,2017:3553-3559(2018-05-14).https://arxiv.org/abs/1702.08835v2.
[11] FENG J,YU Y,ZHOU Z H,.Multi-Layered Gradient Boosting Decision Trees [C]//arXiv:1806.00007.(2018-05-31).https://arxiv.org/abs/1806.00007.
[12] 曹正凤 . 随机森林算法优化研究 [D]. 北京:首都经济贸易大学,2014.
[13] CHEN T Q,HE T,BENESTY M,etal.XGBoost:Extreme Gradient Boosting [EB/OL].(2019-08-01). http://ftp.igh.cnrs.fr/pub/CRAN/web/packages/xgboost/index.html.
[14] DIDRIKN.TreeBoosting WithXgboost-Why Does XGBoostwin“Every”Machine Learning Competition [EB/OL].(2017-10-22).https://brage.bibsys.no/xmlui/bitstream/handle/11250/2433761/16128_FULLTEXT.pdf.
[15] 周志华 . 机器学习:第 1 版 [M]. 北京:清华大学出版社,2016.
[16] DAVIS J,GOADRICHM.xgboost:The Relationship Between Precision-Recall and ROC Curves [EB/OL].International Conference on Machine Learning.(2006-01-15).https://minds.wisconsin.edu/bitstream/handle/1793/60482/TR1551.pdf?sequence=1&is Allowed=y.
作者简介:
林鹏程(1980-),男,汉族,福建龙岩人,算法工程师,硕士,研究方向:人工智能在企业中的应用;
唐辉(1981-),男,汉族,湖北天门人,高级工程师,硕士,研究方向:人工智能在企业中的应用。