摘 要:文章旨在基于半潜式平台系统故障警报分类数据集构建故障警报预测模型。采用 SMOTE 过采样与随机欠采样相结合的混合采样方法对训练集进行采样,实验结果表明,最佳采样率为过采样率 0.3、欠采样率 0.6,此时随机森林在采样后训练集上采用五折交叉验证训练后得到的 AUC 得分均值最高。同时说明了该混合采样方法在采用了最佳的采样率后可以较好地改善训练集类别不平衡问题,较大地提高模型的泛化能力。
关键词:半潜式平台;警报;混合采样;集成学习
DOI:10.19850/j.cnki.2096-4706.2022.06.019
基金项目: 烟台市重点研发计划项目(2020JMRH010)
中图分类号:TP18;U661 文献标识码:A 文章编号:2096-4706(2022)06-0079-03
Alarm Prediction Model of Semi Submersible Platform Based on Composite Sampling and Stacking Integration
LI Zhili 1, LIU Xinghui 1, LI Yuan1, LU Xudi 2
(1.Shandong Vheng Data Technology Co., Ltd., Yantai 264003, China; 2.CIMC Offshore Engineering Institute Co., Ltd, Yantai 264003, China)
Abstract: The purpose of this paper is to build a fault alarm prediction model based on the fault alarm classification data set of semi submersible platform system. The composite sampling method combining SMOTE with random under sampling is used to sample the training set. The experimental results show that the optimal sampling rate is 0.3 for over sampling and 0.6 for under sampling. At this time, the random forest is trained on the training set after sampling using five fold cross validation, and the mean value of AUC scores is highest. At the same time, it shows that the composite sampling method after using the best sampling rate can improve the class imbalance problem of training set and greatly improve boost the generalization ability of the model.
Keywords: semi submersible platform; alarm; composite sampling; Ensemble Learning
参考文献:
[1] 任仲福 . 海洋石油钻井平台安全风险以及风险管控分析[J]. 科技创新与应用,2017(27):124+126.
[2] 闫会宾 . 海洋平台结构与设备的可靠度与风险评估 [D].杭州:浙江大学,2016:1-22.
[3] 白旭,汤荣铿,罗小芳,等 . 基于故障树分析和贝叶斯网络方法的半潜式钻井平台系统多状态可靠性分析 [J]. 中国造船,2020,61(2):220-228.
[4] 罗小芳,孙宇,白旭,等 . 基于动态故障树的半潜式钻井平台钻井系统失效风险分析 [J]. 船舶工程,2019,41(3):107-114.
[5] 贾占桥 . 船舶机电设备诊断方法研究 [J]. 内燃机与配件,2020(19):131-132.
[6] 李科文,冷阿伟,刘庆江,等 . 半潜式钻井平台开发方案综合评价研究 [J]. 航海工程,2018,47(A01):97-100.
[7] 刘子健,李飞 . 半潜式平台适用性影响因素研究 [J]. 中国造船,2019(Z1):392-397.
[8] KAUR H,PANNU H S,MALHI A K. A systematic review on imbalanced data challenges in machine learning:Applications and solutions [J].ACM Computing Surveys(CSUR),2019,52(4):1-36.
[9] KRAWCZYK B. Learning from imbalanced data:open challenges and future directions [J]. Progress in Artificial Intelligence, 2016,5(4):221-232.
[10] GUO,H X,LI Y J,SHANG J,et al. Learning from class-imbalanced data:Review of methods and applications [J]. Expert Systems with Applications,2017,73:220-239.
作者简介:李至立(1988.01—),男,汉族,山东济宁人,中级工程师,硕士,2011 年毕业于哈尔滨工业大学计算机科学与技术专业,主要研究方向:大规模数据处理、分布式存储与分析、商务智能等。