摘 要:心脏病是威胁国民健康的主要疾病。将机器学习技术应用于心脏病诊断任务,是临床医疗诊断领域的重大突破。以 UCI 的心脏病数据集为实验数据,提出一种基于 Stacking 算法的集成学习模型,对不同的基分类器组合进行集成训练,寻找最优的组合策略。结果表明,基分类器差异性大的组合模型具有更好的分类性能,该方法对于提高分类器性能具有可行性和有效性,在心脏病数据集上的分类准确率达到 89.78%,能够较好地解决心脏病的临床诊断问题。
关键词:Stacking;k 折交叉验证;心脏病诊断
DOI:10.19850/j.cnki.2096-4706.2022.24.025
中图分类号:TP18 文献标识码:A 文章编号:2096-4706(2022)24-0097-04
Heart Disease Diagnosis Method Based on Stacking Ensemble Learning
SHAO Weixi
(South China Normal University, Guangzhou 510631, China)
Abstract: Heart disease is a major disease that threatens the national health. Applying machine learning technology to heart disease diagnosis task is a major breakthrough in the field of clinical medical diagnosis. Taking the heart disease dataset of UCI as experimental data, this paper proposes an ensemble learning model based on Stacking algorithm, carries on ensemble training on different combinations of base classifiers to find the optimal strategy. The results show that the combined model with large differences in base classifiers has better classification performance. The proposed method is feasible and effective for improving classifier capability, and the classification accuracy rate of heart disease dataset reaches 89.78%, which can better solve the clinical diagnosis problems of heart disease.
Keywords: Stacking; k-fold cross validation; heart disease diagnosis
参考文献:
[1] WOLPERT D H.Stacked generalization [J].Neural Networks,1992,5(2):241-259.
[2] 周星,丁立新,万润泽,等 . 分类器集成算法研究 [J]. 武汉大学学报:理学版,2015,61(6):503-508.
[3] 周志华 . 机器学习 [M]. 北京:清华大学出版社,2016.
[4] 韩腾飞,李亚平 . 基于 Stacking 集成学习的剩余使用寿命预测 [J/OL]. 计算机集成制造系统,2022:1-18(2022-03-15)[2022-06-04].http://kns.cnki.net/kcms/detail/11.5946.TP.20220314.1224.016.html.
[5] 孙彤,陈砚桥 . 基于 AHP 的 Stacking 算法基分类器选择[J]. 兵工自动化,2022,41(1):39-42.
[6] 陆家发,张国明,陈安琪 . 基于深度学习的疾病诊断 [J].医学信息学杂志,2017,38(4):39-43.
[7] 李阳,黄伟,席建忠 . 基于 Stacking 算法集成模型的电厂NOx 排放预测 [J]. 热能动力工程,2021,36(5):73-81.
作者简介:邵为希(2001.07—),女,汉族,广东深圳人,本科在读,研究方向:信息与计算科学。