摘 要:K 最近邻算法是机器学习中一种经典的监督算法。但该算法处理数据时需要遍历所有特征,在处理高维数据时,运行效率低。针对该问题,文章采用方差过滤对数据特征进行预处理,该方法可有效降低数据特征数,提高算法运行效率。实验测试表明,经过特征预处理后的数据集,特征数有效减少,在不降低准确率的情形下,可减少 30% 的运行时间。
关键词:KNN;方差过滤;卡方过滤;F 检验;互信息法
DOI:10.19850/j.cnki.2096-4706.2022.04.033
中图分类号:TP18 文献标识码:A 文章编号:2096-4706(2022)04-0126-03
Research on Feature Filtering Preprocessing Based on KNN Algorithm
ZHANG Yuhui, CHANG Zenan
(Hunan Petrochemical Vocational Technology College, Yueyang 414021, China)
Abstract: K-Nearest Neighbors (KNN) algorithm is a classical supervision algorithm in machine learning. However, the algorithm needs to traverse all the features when processing data, and when processing high-dimensional data , the algorithm runs inefficiently. In view of this problem, variance filtering is used in this paper to preprocess data features, which can effectively reduce the number of data features and improve the efficiency of the algorithm. Experimental results show that the number of of data set can be effectively reduced after feature preprocessing, and the running time can be reduced by 30% without reducing the accuracy.
Keywords: KNN; variance filtering; chi-square filtering; F test; mutual information method
参考文献:
[1] 吴星辰 . 基于 KNN 算法的城市轨道车辆时序数据异常检测 [J]. 智能城市,2021,7(22):20-21.
[2] 张燕宁,陈海燕,常莹,等 . 基于 KNN 算法的手写数字识别技术研究 [J]. 电脑编程技巧与维护,2021(11):123-124+132.
[3] 王巨灏,蔡嘉辉,王琨等 . 基于 KNN 与 LOF 算法的台区线损异常检测 [J]. 电工技术,2021(24):175-177.
[4] 刘云,郑文凤,张轶 . 卡方校正算法对入侵检测特征选择的优化 [J]. 武汉大学学报(理学版),2022,68(1):65-72.
[5] 郭鸿飞 .F 检验法和 T 检验法在方法验证过程中的应用探究 [J]. 山西冶金,2019,42(4):114-116.
[6] 周育琳,穆振侠,彭亮,等 . 基于互信息与神经网络的天山西部山区融雪径流中长期水文预报 [J]. 长江科学院院报,2018,35(8):17-21.
作者简介:张玉辉(1983.10—),男,汉族,湖南岳阳人,讲师,硕士,研究方向:机器学习。