当前位置>主页 > 期刊在线 > 信息技术 >

信息技术23年7期

一种自动聚类的离群点识别方法研究
黄强,叶青,聂斌,朱彦陈,郭永坤
(江西中医药大学 计算机学院,江西 南昌 330004)

摘  要:针对传统离群点识别方法对数据的分布形状和密度有特定要求,需设定参数的问题,提出了一种自动聚类的离群点识别方法。该方法通过引入相互K近邻数来表示数据对象的离群度,对数据的分布形状、分布密度无要求,可以输出全局离群点、局部离群点和离群簇;通过 k 次迭代来实现自动聚类,无需人为设定参数。通过合成数据以及 UCI 数据实验,验证了该方法的有效性、普适性。


关键词:局部离群点;离群点识别;离群簇;自动聚类;数据挖掘



DOI:10.19850/j.cnki.2096-4706.2023.07.002


基金项目:国家自然科学基金(62141202,82260988)


中图分类号:TP301                                        文献标识码:A                                     文章编号:2096-4706(2023)07-0006-05


Research on an Automatic Clustering Outlier Recognition Method

HUANG Qiang, YE Qing, NIE Bin, ZHU Yanchen, GUO Yongkun

(School of Computer Science, Jiangxi University of Chinese Medicine, Nanchang 330004, China)

Abstract: Aiming at the problem that traditional outlier recognition methods have specific requirements for the distribution shape and distribution density of data and need to set parameters, an automatic clustering outlier recognition method is proposed. This method represents the outlier degree of data objects by introducing the mutual K-nearest neighbor number, with no requiring for the distribution shape and distribution density of the data, and can output global outliers, local outliers, and outlier cluster; Automatic clustering is achieved through k iterations, no need to manually set parameters. The effectiveness and universality of this method are verified through synthetic data and UCI data experiments.

Keywords: local outlier; outlier recognition; outlier cluster; automatic clustering; data mining


参考文献:

[1] BRZEZIŃSKA A N,HORYŃ C. Outliers in Covid 19 data based on Rule representation-the analysis of LOF algorithm [J].Procedia Computer Science,2021,192:3010-3019.

[2] 吕承侃,沈飞,张正涛,等.图像异常检测研究现状综述 [J].自动化学报,2022,48(6):1402-1428.

[3] MOHAMMADPOUR L,LING T C,LIEW C S,et al. A Survey of CNN-Based Network Intrusion Detection [J].Applied Sciences,2022,12(16):8162-8162.

[4] 黄彦斌,骆德汉,蔡高琰 . 基于电力数据分析的污水站点监测方法研究 [J]. 现代信息科技,2021,5(21):121-125.

[5] HAWKINS D. Identification of outliers [M].London: Chapman and Hall,1980.

[6] 周玉,朱文豪,房倩,等 . 基于聚类的离群点检测方法研究综述 [J]. 计算机工程与应用,2021,57(12):37-45.

[7] 黄旺华,王钦若 . 基于距离统计的有序纹理点云离群点检测 [J]. 计算技术与自动化,2019,38(1):139-144.

[8] 梅林,张凤荔,高强 . 离群点检测技术综述 [J]. 计算机应用研究,2020,37(12):3521-3527.

[9] KNORR E M,NG R T. Algorithms for Mining Distance-Based Outliers in Large Datasets [C]//Proceedings of the 24rd International Conference on Very Large Data Bases.San Francisco:Morgan Kaufmann Publishers Inc,1998:392-403.

[10] BREUNIG M M,KRIEGEL H P,NG R T,et al. LOF: Identifying Density-Based Local Outliers [EB/OL].[2022-10-28].https://www.docin.com/p-291134797.html.

[11] HA J,SEOK S,LEE J S. Robust outlier detection using the instability factor [J].Knowledge-Based Systems,2014,63:15-23.

[12] SHAO M L,QI D Y,XUE H L. Big data outlier detection model based on improved density peak algorithm [J].Journal of Intelligent & Fuzzy Systems,2021,40(4):6185-6194.

[13] 杨俊闯,赵超 .K-Means 聚类算法研究综述 [J]. 计算机工程与应用,2019,55(23):7-14+63.

[14] ESTER M,KRIEGEL H P,SANDER J,et al. A densitybased algorithm for discovering clusters in large spatial data bases with noise [C]//Proceedings of the 2nd International Conference on Knowledge Discovering in Databases and Data Mining (KDD-96).Massachusetts:AAAI Press,1996,226-232.

[15] GAN G J,NG M K P. K-means clustering with outlier removal [J].Pattern Recognition Letters,2017,90:8-14.

[16] HUANG J L,ZHU Q S,YANG L J,et al. A Novel Outlier Cluster Detection Algorithm without Top-n Parameter [J].KnowledgeBased Systems,2017,121:32-40.

[17] KERBY M,KERBY M. Six degrees of separation [J].AOPA pilot,2012,55(2),68-68.

[18] BACKSTROM L,BOLDI P,ROSA M,et al. Four Degrees of Separation [EB/OL].[2022-10-28].http://snap.stanford.edu/ class/cs224w-readings/backstrom12four.pdf.

[19] DUA D,GRAFF C. UCI Machine Learning Repository [EB/ OL].[2022-10-28].http://archive.ics.uci.edu/ml. 

[20] HAWKIN S,HE H X,WILLIAMS G J,et al.Outlier Detection Using Replicator Neural NetworKs [C]//2000:Data Warehousing and Knowledge Discovery.France:Springer,2002:4-6.

[21] ZHU Q S,FAN X G,FENG J. Outlier detection based on K_Neighborhood MST [C]//2014 IEEE 15th International Conferenceon Information Reuse and Integration.Redwood City:IEEE,2014,718-724.


作者简介:黄强(1993—),男,汉族,江西上饶人,助教,硕士,主要研究方向:数据挖掘、中医信息学研究。