Expert Syst. Appl. | 2021

A minority oversampling approach for fault detection with heterogeneous imbalanced data

 

Abstract


Abstract Between-class imbalance and feature heterogeneity commonly coexist in monitoring data collected from engineering systems. The decision hyperplanes of data-driven methods when adopted for fault detection with imbalanced data may be biased to the majority class, resulting in a low fault-detection rate. Various data- and algorithm-level methods have been proposed, with minority oversampling methods among the most popular and successful. However, state-of-the-art minority oversampling methods are unsuitable for imbalanced data with heterogeneous features, including both numeric and nominal variables. There are two main drawbacks: 1) taking a nominal variable as a numeric variable is not trivial, and synthetic minority samples may exceed the value range of nominal variables; 2) conventional distance measures, e.g., Euclidean distance, cannot properly measure the similarity of samples with heterogeneous features. For these considerations, this work proposes new fault-detection methods. The methodological contributions include: 1) two different distance measures adopted for heterogeneous features in the minority class; 2) a new method for coordinate calculation of synthetic samples considering feature heterogeneity; and 3) a new strategy to encode nominal variables into numeric data for data-driven models. Several public heterogeneous imbalanced datasets and a real case study considering fault detection in high-speed trains are considered to verify the effectiveness of the proposed methods. To the knowledge of the author, this is also the first time that the effectiveness of diverse oversampling methods on heterogeneous imbalance data are specifically discussed.

Volume 184
Pages 115492
DOI 10.1016/J.ESWA.2021.115492
Language English
Journal Expert Syst. Appl.

Full Text