IEEE Transactions on Cybernetics | 2019

Evolutionary Multiobjective Clustering and Its Applications to Patient Stratification

 
 

Abstract


Patient stratification has a major role in enabling efficient and personalized medicine. An important task in patient stratification is to discover disease subtypes for effective treatment. To achieve this goal, the research on clustering algorithms for patient stratification has brought attention from both academia and medical community over the past decades. However, existing clustering algorithms suffer from realistic restrictions such as experimental noises, high dimensionality, and poor interpretability. In particular, the existing clustering algorithms usually determine clustering quality using only one internal evaluation function. Unfortunately, it is obvious that one internal evaluation function is hard to be fitted and robust for all datasets. Therefore, in this paper, a novel multiobjective framework called multiobjective clustering algorithm by fast search and find of density peaks is proposed to address those limitations altogether. In the proposed framework, a parameter candidate population is evolved under multiple objectives to select features and evaluate clustering densities automatically. To guide the multiobjective evolution, five cluster validity indices including compactness, separation, Calinski–Harabasz index, Davies–Bouldin index, and Dunn index, are chosen as the objective functions, capturing multiple characteristics of the evolving clusters. Multiobjective differential evolution algorithm based on decomposition is adopted to optimize those five objective functions simultaneously. To demonstrate its effectiveness, extensive experiments have been conducted, comparing the proposed algorithm with 45 algorithms including nine state-of-the-art clustering algorithms, five multiobjective evolutionary algorithms, and 31 baseline algorithms under different objective subsets on 94 datasets featuring 35 real patient stratification datasets, 55 synthetic datasets based on a real human transcription regulation network model, and four other medical datasets. The numerical results reveal that the proposed algorithm can achieve better or competitive solutions than the others. Besides, time complexity analysis, convergence analysis, and parameter analysis are conducted to demonstrate the robustness of the proposed algorithm from different perspectives.

Volume 49
Pages 1680-1693
DOI 10.1109/TCYB.2018.2817480
Language English
Journal IEEE Transactions on Cybernetics

Full Text