摘 要:车辆保险的购买意向预测是一个二分类预测问题,可分为有意向购买和无意向购买,使用 XGBoost 算法和Logistic Regression 算法对车辆保险数据集进行模型构建和分类预测。该研究首先对原始数据集进行数据预处理,然后使用网格搜索法,采取五折交叉验证来对模型进行超参数优化并构建预测模型,最后选择 ROC 曲线和 AUC 值作为预测模型的评价指标模型的泛化能力进行性能评估,结果表明 XGBoost 算法具有最好的预测效果。
关键词:XGBoost 算法;数据预处理;网格搜索;模型评估;ROC 曲线
DOI:10.19850/j.cnki.2096-4706.2023.06.008
中图分类号:TP391 文献标识码:A 文章编号:2096-4706(2023)06-0031-04
Research on Application of XGBoost Algorithm in Vehicle Insurance Purchase Prediction
WANG Chaoqiang
(School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, China)
Abstract: The purchase intention prediction of vehicle insurance is a binary prediction problem, which can be divided into intentional purchase and unintentional purchase. This paper uses the XGBoost algorithm and Logistic Regression algorithm to carry out model construction and classification prediction of the vehicle insurance data set. The research firstly performs data preprocessing on the original dataset. Then, the grid search method and five-fold cross-validation are used to optimize the hyperparameters of the model and construct a prediction model. Finally, the ROC curve and AUC value are selected as the evaluation indicators of the prediction model to evaluate the generalization ability of the model. The results show that the XGBoost algorithm has a very good prediction effect.
Keywords: XGBoost algorithm; data preprocessing; grid search; model evaluation; ROC curve
参考文献:
[1] 刘璐,张博江 . 我国机动车辆保险市场发展的需求拉动因素研究 [J]. 保险研究,2012(8):83-88.
[2] 朱南军,王敬瑜 . 我国车辆保险市场信息不对称问题分析[J]. 保险研究,2016(9):16-27.
[3] 郭念国 . 朴素贝叶斯算法与车辆风险分类 [J]. 河南城建学院学报,2020,29(3):87-92.
[4] CHENG X T. Machine Learning Application in Car Insurance Direct Marketing [J].International Journal of Data Science and Advanced Analytics,2020,2(2):18-25.
[5] LIANG W Z,LUO S-Z,ZHAO G Y,et al. Predicting Hard Rock Pillar Stability Using GBDT,XGBoost,and LightGBM Algorithms [J].Mathematics,2020,8(5):765.
[6] PESANTEZ-NARVAEZ J,GUILLEN M,ALCAÑIZ M. Predicting Motor Insurance Claims Using Telematics Data—XGBoost versus Logistic Regression [J].Risks,2019,7(2):70.
[7] 宋玉萍,朱家明,张雅娴,等 . 基于 Logistic 回归对影响汽车保险续保因素的分析 [J]. 哈尔滨师范大学自然科学学报,2020,36(4):16-23.
[8] HANAFY M,MING R X. Improving Imbalanced Data Classification in Auto Insurance by the Data Level Approaches [J]. International Journal of Advanced Computer Science and Applications, 2021,12(6):493-499.
[9] CARRINGTON A M,MANUEL D G,FIEGUTH P W, et al. Deep ROC Analysis and AUC as Balanced Average Accuracy to Improve Model Selection,Understanding and Interpretation [J/ OL].arXiv:2103.11357 [stat.ME].(2021-03-21).https://arxiv.org/abs/2103.11357.
作者简介:王超强(1995—),男,汉族,河南周口人,硕士研究生在读,研究方向:大数据与云计算。