Appl. Soft Comput. | 2021

Fuzzy C-Means-based Isolation Forest

 
 
 
 

Abstract


Abstract The problem of finding anomalies (outliers) in databases is one of the most important issues in modern data analysis. One of the reasons is the occurrence of this issue in almost every type of database, including numerical, categorical, time, mixed, or graphic data. There are currently many methods often dedicated to specific data analysis. Finally, this topic is extremely interesting per se, as a research problem that intrigues researchers. One of the classic methods of data analysis dedicated to finding the anomalies in the data is Isolation Forest. However, this method, with a few exceptions, has not been modified from the time of its first publication, and, in particular, it has not yet appeared in combination with the typical fuzzy methods used for grouping such as Fuzzy C-Means (FCM) clustering. In this study, we thoroughly analyze this approach, as well as several related ones. We examine the possibilities of this technique and analyze it in detail for characteristics of data (database size, number of attributes, records, their type, etc.). It is worth noting that FCM allows to obtain membership grades of elements forming Isolation Forest nodes to clusters on the basis of which these nodes are built. Hence, at the stage of calculating the anomaly scores, this information is effectively used, in particular to express how much a given element may belong to a group of similar elements, which can be inferred from the characteristics of the cluster in which it lies. In this study, we propose a set of methods enhancing the Isolation Forest on a basis of Fuzzy C-Means. The results of numerical experiments carried using 27 various datasets and reported in this paper lead us to the conclusion that FCM can play a pivotal role in an enhancement of Isolation Forest approach and raises up the values of particular measures of effectiveness of the anomaly detection methods.

Volume 106
Pages 107354
DOI 10.1016/J.ASOC.2021.107354
Language English
Journal Appl. Soft Comput.

Full Text