当前位置>主页 > 期刊在线 > 计算机技术 >

计算机技术21年14期

基于混合特征选择模型 CatBoost-LightGBM 的违约风险预测研究
程楠楠
(江西科技学院 信息工程学院,江西 南昌 330098)

摘  要:疫情后,互联网消费金融在国民经济复苏增长中发挥积极作用,但因其产品本身特殊性及过快的发展性,也伴随大量的风险。文中在算法可解析性、模型应用性(识别性、准确性、低成本、稳定性)基础上构建了混合特征选择模型CatBoost-LightGBM,并将此模型应用于某知名信贷平台。结果表明,混合特征选择模型 CatBoost-LightGBM 在综合评价上显著优于单一模型,对基础模型 LR 有 0.19 的提升,对基础特征的 LightGBM、XGboost 等模型有 0.03 的提升。


关键词:违约风险预测;消费金融;大数据风控;特征选择;梯度提升算法



DOI:10.19850/j.cnki.2096-4706.2021.14.030


中图分类号:TP183                                       文献标识码:A                                      文章编号:2096-4706(2021)14-0116-05


Default Risk Prediction Research Based on Hybrid Feature Selection Model Catboost-LightgBM

CHENG Nannan

(School of Information Engineering, Jiangxi University of Technology, Nanchang 330098, China)

Abstract: After the epidemic, internet consumer finance plays a positive role in the recovery and growth of the national economy, but due to the particularity and rapid development of its products, it is also accompanied by a large number of risks. In this paper, a hybrid feature selection model catboost-LightgBM is constructed on the basis of the analytical ability of the algorithm and the application of the model. Finally, the model is applied to a well-known credit platform. The results show that the hybrid feature selection model catboostLightgBM is significantly better than the single model in the comprehensive evaluation. It improves the basic model LR by 0.19 and the lightgbm, xgboost and other models with basic features by 0.03.

Keywords: default risk prediction; consumer finance; big data risk control; feature selection; gradient lifting algorithm


参考文献:

[1] 单良,乔杨 . 数据化风控 [M]. 北京:电子工业出版社,2018.

[2] ABID L,MASMOUDI A,ZOUARI-GHORBEL S. The Consumer Loan’s Payment Default Predictive Model:an Application of the Logistic Regression and the Discriminant Analysis in a Tunisian Commercial Bank [J].Journal of the Knowledge Economy,2018,9:948-962.

[3] 王小燕,袁腾,段湘斌 . 基于零膨胀分位数两部模型的银行贷款违约预测研究 [J/OL]. 中国管理科学:1-15[2021-04-25]. https://doi.org/10.16381/j.cnki.issn1003-207x.2020.0441.

[4] 周波,李俊峰 . 结合目标检测的人体行为识别 [J]. 自动化学报,2020(9):1961-1970.

[5] 李泽远 . 可超越评分卡模型么?基于 LightGBM 与卷积神经网络在贷款违约风险预测的研究 [J]. 特区经济,2021(5): 67-69.

[6] STEVENSON M,MUES C,BRAVO C. The value of text for small business default prediction:A Deep Learning approach [J].European Journal of Operational Research,2021,295(2):758-771.

[7] 黄益平,邱晗 . 大科技信贷:一个新的信用风险管理框架 [J]. 管理世界,2021,37(2):12-21+50+2+16.

[8] CHEN T Q,GUESTRIN C. XGBoost:A Scalable Tree Boosting System [C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledeg Discovery and Data Mining. New York:ACM,2016:1-10.

[9] KE G L,MENG Q,FINLEY T,et al. LightGBM:a highly efficientgradient boosting decision tree [C]//Proceedings of the 30thInternational Conference on Neural Information ProcessingSystems. Red Hook:Curran Associates Inc. ,2017:3146-3154.

[10] PROKHORENKOVA L,GUSEV G,VOROBEV A,et al. CatBoost:unbiased boosting with categorical features [C]//Advances in Neural Information Processing Systems.Montreal,2018:6638-6648.


作者简介:程楠楠(1987.12—),女,汉族,江苏南通人,其他高级,硕士,研究方向:商业分析、机器学习、大数据风控。