面向不平衡数据集的煤矿监测系统异常数据识别方法
Abnormal data recognition method of coal mine monitoring system based on imbalanced data set
【索引】冀汶莉,郗刘涛,王斌.面向不平衡数据集的煤矿监测系统异常数据识别方法[J].工矿qy288千亿国际,2020,46(1):18-25.
【Reference】JI Wenli,XI Liutao,WANG Bin.Abnormal data recognition method of coal mine monitoring system based on imbalanced data set[J].Industry and Mine Automation,2020,46(1):18-25.
【DOI】10.13272/j.issn.1671-251x.17502
【作者】冀汶莉,郗刘涛,王斌
【Author】 JI Wenli,XI Liutao,WANG Bin
【作者机构】西安科技大学 通信与信息工程学院, 陕西 西安710054
【Unit】College of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, China
【摘要】异常数据识别对于煤矿安全监测系统具有重要作用,但安全监测系统中异常数据一般只占数据总量的1%左右,不平衡性是此类数据的固有特点。目前多数机器学习算法在不平衡数据集上的分类预测准确率和灵敏度都相对较差。为了能准确识别异常数据,以煤矿分布式光纤竖井变形监测系统采集的数据为研究对象,提出了一种面向不平衡数据集、基于去重复下采样(RDU)、合成少数类过采样技术(SMOTE)和随机森林(RF)分类算法的煤矿监测系统异常数据识别方法。该方法利用RDU算法对多数类数据进行下采样,去除重复样本;利用SMOTE算法对少数类异常数据进行过采样,通过合成新的异常数据来改善数据集的不平衡性;并利用优化后的数据集训练RF分类算法,得到异常数据识别模型。在6个真实数据集上的对比实验结果表明,该方法的异常数据识别准确率平均值达到99.3%,具有较好的泛化性和较强的鲁棒性。
【Abstract】Abnormal data recognition plays an important role in mine safety monitoring system, but abnormal data generally only accounts for about 1% of the total data of the safety monitoring system, data imbalance is an intrinsic characteristics of real-time data. At present, most of machine learning algorithms have relatively poor classification accuracy and sensitivity while dealing with classification on imbalanced data sets. In order to accurately identify abnormal data, the data collected by the distributed fiber shaft deformation monitoring system of coal mine is taken as research object, RDU-SMOTE-RF abnormal data recognition method of coal mine monitoring system based on imbalanced data set was proposed. The method uses RDU algorithm for under-sampling of majority data to remove duplicate samples,uses SMOTE algorithm for oversampling of minority abnormal data to improve the imbalance of the data set by synthesizing new abnormal data, and uses the optimized data set to train random forest (RF) classification algorithm to get abnormal data recognition model. The comparison experimental results on 6 real data sets show that the method has an average recognition accuracy rate of 99.3% for abnormal data, which has good generalization and strong robustness.
【关键词】 煤矿安全监测; 异常数据识别; 不平衡数据集; 机器学习; 大数据; 下采样; 过采样; 随机森林
【Keywords】coal mine safety monitoring; abnormal data recognition; imbalanced data set; machine learning; big data; under-sampling; oversampling; random forest
【文献出处】工矿qy288千亿国际,2020年1期
【基金】国家重点研发计划项目(2018YFC0808301);国家自然科学基金资助项目(41027002,51804244);陕西省教育厅科研计划项目(16JK1488)
【分类号】TD76
本网站仅提供本刊2009年之后的全文下载,其它年份的全文下载将自动转到中国知网。中国知网不支持迅雷等加速下载工具,请取消加速工具后下载。
【关 闭】