摘 要:以国外某医疗检测机构提供的预测心脏病的开源数据集进行分析和研究。分析引起心脏病的相关因素与患者之间的关系,并构建决策树(DT)和K近邻算法(KNN)两种机器学习算法模型,对心脏病进行分类和预测类别。以准确率(Accuracy)、精确度(Precision)、召回率(Recall)、F1_ 得分(F1_score)作为模型评价指标,比较和分析了两种机器学习算法模型在分类和预测方面的性能,从而得出最优的模型。研究得出机器学习算法模型为心脏病预测和诊断提供有效的科学依据。
DOI:10.19850/j.cnki.2096-4706.2022.19.017
中图分类号:TP18 文献标识码:A 文章编号:2096-4706(2022)19-0067-04
Research on Predictive Diagnosis Model of Heart Disease Based on Machine Learning Algorithm
LIANG Jinghan, XU Yajie
(Zhengzhou University of Science and Technology, Zhengzhou 450064, China)
Abstract: An open source dataset for predicting heart disease provided by a foreign medical testing facility is used for analysis and research. This paper analyzes the relationship between factors associated with causing heart disease and patients, and constructs two machine learning algorithm models of Decision tree (DT) and k-Nearest Neighbor algorithm (KNN), classifies and predicts categories of heart disease. The Accuracy, Precision, Recall, F1_score are used as model evaluation metrics to compare and analyze the performance of the two machine learning algorithm models in classification and prediction aspects, so as to arrive at the optimal model. The research shows that machine learning algorithm models could provide an effective scientific basis for heart disease prediction and diagnosis.
Keywords: machine learning; decision tree; k-Nearest Neighbor algorithm; heart disease
参考文献:
[1] 孙铁铮,于泽灏 . 基于机器学习的心脏病例分类预测研究[J]. 电脑知识与技术,2021,17(26):96-97+104.
[2] 叶苏婷,潘媛媛,毕迎春 . 基于决策树算法的心脏病发病预警模型研究 [J]. 电脑知识与技术,2020,16(19):187-189.
[3] 唐诗淇,文益民,秦一休 . 一种基于局部分类精度的多源在线迁移学习算法 [J]. 软件学报,2017,28(11):2940-2960.
[4] 刘宇,程学林 . 基于决策树算法的爬虫识别技术 [J]. 软件,2017,38(07):122-125.
[5] 张思民 . Python 程序设计案例教程:从入门到机器学习微课版 [M]. 北京:清华大学出版社,2018:233-243.
[6] VEMBANDASAMY K,SASIPRIYA R,DEEPA E. Heart diseases detection using naïve Bayes algorithm [J]. International Journal of Innovative Science,Engineering and technology,2015(2):441-444.
作者简介:梁靖涵(1992.05—),女,汉族,河南商丘人,助教,硕士,研究方向:数据挖掘及可视化;许亚杰(1991.05—),女,汉族,河南周口人,助教,硕士,研究方向:数据挖掘。