当前位置>主页 > 期刊在线 > 计算机技术 >

计算机技术22年14期

基于 XGBoost 模型的新型冠状病毒(COVID-19) 疫情分析与预测
孙许可
(中国人民武装警察部队士官学校,浙江 杭州 311400)

摘  要:为了对新型冠状病毒(COVID-19)传播趋势实现更加精确的预测,提出了一种 COVID-19 的智能估算方法。首先利用 Matplotlib 对 COVID-19 数据进行可视化分析、提取特征,利用 XGBoost 建立智能估算方法模型,结合全国、湖北以及其他四个省份的 COVID-19 数据进行智能估算。实验结果表明,与线性回归、随机森林、SVM、KNN 相比,该方法在平均绝对误差、均方根百分比误差和最大估算误差 3 个技术指标上均优于其他四种回归算法,具有较高的估算精度和泛化能力。


关键词:新型冠状病毒;疫情;特征提取;模型构建;XGBoost 算法



DOI:10.19850/j.cnki.2096-4706.2022.014.014


中图分类号:TP301.6                                     文献标识码:A                                      文章编号:2096-4706(2022)14-0058-05


Analysis and Prediction of Novel Coronavirus (COVID-19) Epidemic Situation Based on XGBoost Model

SUN Xuke

(Basic Department of Armed Police Officer School, Hangzhou 311400, China)

Abstract: In order to achieve more accurate prediction of the spread trend of novel coronavirus (COVID-19), an intelligent estimation method of COVID-19 is proposed. Firstly, this paper uses matplotlib to visualize and analyze COVID-19 data, extracts features, and uses XGBoost to build a model of the intelligent estimation method, and combines COVID-19 data from the whole country, Hubei and four other provinces for intelligent estimation. The experimental results show that compared with linear regression, random forest, SVM and KNN, the method outperforms the other four regression algorithms in three technical indexes: mean absolute error, root mean square percentage error and maximum estimation error, and has higher estimation accuracy and generalization ability.

Keywords: novel coronavirus; epidemic situation; feature extraction; model construction; XGBoost algorithm


参考文献:

[1] DEMERTZIS K,TSIOTAS D,MAGAFAS L. Modeling and forecasting the COVID-19 temporal spread in Greece: an exploratory. approach based on complex network defined splines [J/OL].arXiv: 2005.01163 [physics.soc-ph].(2020-05-03)[2022-03-11].https://arxiv. org/abs/2005.01163.

[2] 陈杰,郭永强,邓满红,等 . 新型冠状病毒肺炎的早期CT 表现及其临床价值 [J]. 中国 CT 和 MRI 杂志,2022,20(1):58-59+70.

[3] 蒋志伟,夏结来 . 新型冠状病毒肺炎疫苗有效性评价中的相关问题探讨 [J]. 中华预防医学杂志,2022,56(1):82-86.

[4] 廖春晓,李立明,王岚,等 . 流行病学方法在新型冠状病毒疫苗上市前后的应用 [J].中华流行病学杂志,2022,43(1):7-13.

[5] CHAN J F W,YUAN S F,KOK K H. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating personto-person transmission:a study of a family cluster [J].The Lancet, 2020,395(10223):514-523.

[6] 杨孝坤,李昱,赵宏婷,等 . 新型冠状病毒感染不同阶段的传染性研究进展 [J]. 中华流行病学杂志,2021,42(1):33-38.

[7] 赵寒,熊宇,杨琳,等 . 重庆市新型冠状病毒肺炎传染性分析 [J]. 国际流行病学传染病学杂志,2020,47(3):187-190.

[8] LI Q,GUAN X H,WU P,et al. Early Transmission Dynamics in Wuhan,China,of Novel Coronavirus–Infected Pneumonia [EB/OL].(2020-01-29)[2022-03-11].https://www.nejm. org/doi/10.1056/NEJMoa2001316.

[9] LU R J,ZHAO X,LI J,et al. Genomic characterization and epidemi- ology of 2019 novel coronavirus:implications for virus origins and re- ceptor binding [EB/OL].(2020-01-29)[2022-03-11].https:// linkinghub.elsevier.com/retrieve/pii/S0140673620302518.

[10] DMITRY I. Predicting the impacts of epidemic outbreaks on global supply chains:A simulation-based analysis on the coronavirus outbreak (COVID-19/SARS-CoV-2) case [J/OL].Transportation Research Part E:Logistics and Transportation Review,2020,136[2022-03-11].https://www.sciencedirect.com/science/article/pii/ S1366554520304300.

[11] 余艳妮,聂绍发,廖青,等 . 传染病预测及模型选择研究进展 [J]. 公共卫生与预防医学,2018,29(5):89-92.

[12] ZHANG B,WEI Z Y,REN J D,et al. An Empirical study on Predicting Blood Pressure using Classification and Regression Trees [J].IEEE Access,2018(6):21758-21768.

[13] CHEN T Q,GUESTRIN C. XGBoost: A scalable tree boosting system [C]//The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Francisco:Association for Computing Machinery,2016:785-794.

[14] 李少亭,王雪瑞 .XGBoost 模型在新冠疫情预测中的研究应用 [J]. 小型微型计算机系统,2021,42(12):2465-2472.

[15] ZOU L,SHU S S,LIN X,et al. Passenger Flow Prediction Using Smart Card Data from Connected Bus System Based on Interpretable XGBoost [J/OL].Wireless Communications and Mobile Computing,2022,2022[2022-03-11].https://www.hindawi.com/ journals/wcmc/2022/5872225/.

[16] 国家卫生健康委员会,国家中医药管理局 . 新型冠状病毒肺炎诊疗方案(试行第六版) [J]. 中国病毒病杂志,2020,10(2):81-85.

[17] 谢晓金,罗康洋,张怡,等 . 非线性组合动态传播率模型与我国COVID-19疫情分析和预测 [J].运筹学学报,2021,25(1):17-30.

[18] 黄丽红,魏永越,沈思鹏,等 . 常见新型冠状病毒肺炎疫情预测方法及其评价 [J]. 中国卫生统计,2020,37(3):322-326.

[19] JIANG J Z,PAN H,LI M B,et al. Predictive model for the 5-year survival status of osteosarcoma patients based on the SEER database and XGBoost algorithm [J].Scientific Reports,2021,11(1): 5542-5542.

[20] 韦振汉,宋树祥,夏海英 . 基于随机森林的锂离子电池荷电状态估算 [J]. 广西师范大学学报(自然科学版),2018,36(4):27-33.

[21] 卢泓宇,张敏,刘奕群,等 . 卷积神经网络特征重要性分析及增强特征选择模型 [J]. 软件学报,2017,28(11):2879-2890.


作者简介:孙许可 (1997—),男,汉族,安徽宣城人,助教,本科,研究方向:信息通信工程、数据分析。