Procedia Computer Science | 2021

The distance function approach on the MiniBatchKMeans algorithm for the DPP-4 inhibitors on the discovery of type 2 diabetes drugs

 
 
 
 
 
 

Abstract


Abstract Several of the DPP-4 inhibitors in the treatment of type 2 diabetes (T2DM) still have unsafe side effects in long-term use. It is necessary to develop a new DPP-4 inhibitor to minimize these unsafe side effects of the drug. QSAR is a model that can be used for the development of DPP-4 inhibitor drugs. The selection of a subset of DPP-4 inhibitor molecules by applying the clustering method can be made to improve the accuracy of the QSAR model. This study aims to select the corresponding DPP-4 inhibitor molecules by using the MiniBatchKMeans algorithm with Levenshtein distance and based on the logP criteria of ‘Lipinski’s Rule of 5’ for QSAR modeling. The research began with the collection of DPP-4 inhibitor molecule data from the ChEMBL database site (https://www.ebi.ac.uk/chembl/) in CSV format. A representation of the molecular structure of the data is obtained from their SMILES features. Before running the clustering process, data in the form of SMILES is extracted into molecular fingerprints using several fingerprint generators, namely MACCS, ECFP, and FCFP. Clustering was carried out on five fingerprint datasets, including ECFP (with 4 and 6 diameters), FCFP (with 4 and 6 diameters), and MACCS (167 structural keys). The clustering process begins by determining the optimal number of clusters evaluated by applying the Davies-Bouldin index, the Silhouette coefficient, and the Calinski Harabasz score. Based on the clustering process, 1540 clusters were obtained from the minimum DBI cluster evaluation values of 0.545311, maximum SCO of 0.302842, and maximum CHS of 331.3942 from the MACCS fingerprint dataset. Based on logP criteria from ‘Lipinski’s Rule of 5’, 1532 molecules were obtained for the molecular selection process that have logP values between -0.205 to 4.95.

Volume 179
Pages 127-134
DOI 10.1016/J.PROCS.2020.12.017
Language English
Journal Procedia Computer Science

Full Text