摘 要:文章提出了一种提取一维统计学中的特征作为属性,通过随机森林进行训练的有监督学习的异常检测方法。作为属性的特征有标准分异常值、格拉布斯异常值、中位数方差异常值和平均偏离值等。现阶段一般采用无监督模型和集成学习的方法来检测异常值。文章提出的方法就是基于现阶段方法做的一个升级版本,能检测出大部分跨区换卡、套餐变更和个人开机的中国移动业务接口服务异常值。
关键词:随机森林;一维特征提取;有监督学习;业务接口服务异常值
DOI:10.19850/j.cnki.2096-4706.2023.05.040
中图分类号:TP311 文献标识码:A 文章编号:2096-4706(2023)05-0163-04
Detection of Interface Service Outlier Based on Feature Extraction and Random Forest
ZUO Jinhu, XIAO Zhongliang, CHEN Lihua
(China Mobile Information Technology Co., Ltd., Beijing 102200, China)
Abstract: This paper proposes a outlier detection method with supervised learning that extracts features in one-dimensional statistics as attributes and trains through Random Forest. The features as attributes include standard score outliers, Grubbs outliers, median variance outliers, and mean deviation values. At this stage, unsupervised models and ensemble learning methods are generally used to detect outliers. The method proposed in this paper is an upgraded version based on the method at current stage, which can detect most of the outliers of China Mobile business interface services such as cross-regional card replacement, package change and personal boot.
Keywords: Random Forest; one-dimensional feature extraction; supervised learning; business interface service outlier
参考文献:
[1] BREIMAN L. Bagging predictors [J].Mach Learn,1996,24(2):123-140.
[2] BREIMAN L. Random Forests [J].Machine Language,45(1): 5-32.
[3] GILLES L. Understanding Random Forests [D].Liege University of Liège,2015.
[4] BARTLETT P,FREUND Y,LEE W S. Boosting the margin:a new explanation for the effectiveness of voting methods [J]. Ann.Statist.,1998,26(5):1651-1686.
[5] KAGGLE. Macro F1-Score Keras [EB/OL].[2022-09-06].https://www.kaggle.com/code/guglielmocamporese/macro-f1-scorekeras/notebook.
作者简介:左金虎(1983—),男,汉族,湖北汉川人,应用业务架构师,硕士研究生,研究方向:应用系统架构演进及AIOps;肖忠良(1986—),男,汉族,山西朔州人,高级工程师,硕士研究生,研究方向:AIOps 算法;陈理华(1985—),男,汉族,湖南邵阳人,集团专家,硕士研究生,研究方向:AIOps 运营。