Are 2D fingerprints still valuable for drug discovery?
Kaifu Gao, Duc Duy Nguyen, Vishnu Sresht, Alan M. Mathiowetz, Meihua Tu, Guo-Wei Wei
AAre 2D fingerprints still valuable for drug discovery?
Kaifu Gao , Duc Duy Nguyen , Vishnu Sresht , Alan M. Mathiowetz , Meihua Tu , and Guo-Wei Wei , , , ∗ Department of Mathematics, Michigan State University, MI 48824, USA. Pfizer Medicine Design, 610 Main St, Cambridge, MA 02139, USA. Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA. Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA.November 5, 2019
Abstract
Recently, molecular fingerprints extracted from three-dimensional (3D) structures using advanced mathemat-ics, such as algebraic topology, differential geometry, and graph theory have been paired with efficient machinelearning, especially deep learning algorithms to outperform other methods in drug discovery applications andcompetitions. This raises the question of whether classical 2D fingerprints are still valuable in computer-aideddrug discovery. This work considers 23 datasets associated with four typical problems, namely protein-ligandbinding, toxicity, solubility and partition coefficient to assess the performance of eight 2D fingerprints. Advancedmachine learning algorithms including random forest, gradient boosted decision tree, single-task deep neuralnetwork and multitask deep neural network are employed to construct efficient 2D-fingerprint based models.Additionally, appropriate consensus models are built to further enhance the performance of 2D-fingerprint-based methods. It is demonstrated that 2D-fingerprint-based models perform as well as the state-of-the-art 3Dstructure-based models for the predictions of toxicity, solubility, partition coefficient and protein-ligand bindingaffinity based on only ligand information. However, 3D structure-based models outperform 2D fingerprint-basedmethods in complex-based protein-ligand binding affinity predictions.
I Introduction
Drug discovery is a multi-parameter optimization process, which involves a long list of chemical, biological, andphysiological properties. For a drug candidate, numerous drug-related properties must be assessed, includingbinding affinity, toxicity, octanol-water partition coefficient (Log P), aqueous solubility (Log S), etc. Binding affinityassesses the strength of a drug’s binding to its target, while, toxicity is a measure of the degree to which achemical compound can damage an organism adversely. In addition, a partition coefficient is defined as theratio of concentrations of a solute in a mixture of two immiscible solvents at equilibrium and, in the case of logP, represents the drug-relatedness of a compound as well as its hydrophobic effect on human bodies. Anotherrelevant drug attribute is aqueous solubility which plays a vital role in distribution, absorption, and biologicalactivity, among other processes because 65-90 % of body mass is water.
Their importance to drug design anddiscovery has been emphasized by many recent surveys.
Indeed, unsatisfactory toxicity or pharmacokineticproperties are responsible for approximately half of drug candidate failures to reach the market. Traditional experiments for measuring drug properties are conducted either in vivo or in vitro. Such experi-ments are quite time consuming and expensive. Additionally, testing with animals can raise important ethicalconcerns. Therefore, various computer-aided or in silico methods become more attractive since they can pro-duce quick results without sacrificing much accuracy in many situations. Among them, one of the most popularapproaches is the quantitative structure-activity/property relationship (QSAR/QSPR) analysis. It assumes thatsimilar molecules have similar bioactivities or physicochemical properties. Based on this assumption, activitiesand properties of new molecules can be predicted by studying the correlation between chemical or structuralfeatures of molecules and their activities or properties, reducing the need for time-consuming experiments. ∗ Corresponding to Guo-Wei Wei. Email: [email protected] a r X i v : . [ q - b i o . B M ] N ov olecular fingerprints are one way of encoding the structural features of a molecule. They play a fundamentalrole in QSAR/QSPR analysis, virtual screening, similarity-based compound search, target molecule ranking,drug ADMET prediction, and other drug discovery processes. Molecular fingerprints are property profiles ofa molecule, usually in the form of vectors with each vector element indicating the existence, the degree orthe frequency of one particular structure feature. Various fingerprints have been developed for molecularfeature encoding in the past few decades.
Most fingerprints are 2D fingerprints which can be extracted frommolecular connection tables without 3D structure information. However, high dimensional fingerprints have alsobeen developed to utilize 3D molecular structure and other information. There are four main categories of 2D fingerprints, namely substructure key-based fingerprints, topological orpath-based fingerprints, circular fingerprints, and pharmacophore fingerprints. Substructure key-based finger-prints are bit strings representing the presence of certain substructures or fragments from a given list of structuralkeys in the compound. Molecular access system (MACCS) is one of the most popular substructure key-based fingerprint methods. Topological or path-based fingerprints are based on analyzing all the fragments of amolecule following a (usually linear) path up to a certain number of bonds, and then hashing every one of thesepaths to create one fingerprint. The most prominent ones in this category are FP2, Daylight and electro-topological state (Estate) fingerprints. Circular fingerprints are also hashed topological fingerprints but ratherthan looking for paths in a molecule, they record the environment of each atom up to a pre-determined radius.A well-known example for this class is extended-connectivity fingerprint (ECFP). Pharmacophore fingerprintsinclude the relevant features and interactions needed for a molecule to be active against a given target, includ-ing 2D-pharmacophore, and extended reduced graph (ERG) fingerprints as examples.Since 2D fingerprints only rely on the 2D structures, their generation is easy, fast and convenient.In addition to the four categories mentioned above, recent improvements in deep learning have enabled thecreation of neural fingerprints â ˘AˇT where the mapping between fingerprints and 2D structures is learned si-multaneously with the parameters of the regression/classification model that maps fingerprints to targets. Theseâ ˘AŸlearnedâ ˘A ´Z fingerprints can potentially improve predictive performance on QSAR/QSPR tasks, but theymust be relearned when trying to predict new properties across significantly different regions of chemical space.Since the focus of this work is on comparing 2D and 3D descriptors across a number of disparate tasks andchemically diverse datasets, we have chosen not to consider neural fingerprints.Most commonly used 2D molecular fingerprints were derived over a decade ago and their validation was car-ried out using classical regression or classification algorithms, such as linear regression, logistic regression,logistic classification, naive Bayes, k-nearest neighbors, support vector machine, etc. On the other hand, new3D structure-based fingerprints built from algebraic topology, differential geometry, geometric graph the-ory, and algebraic graph theory have been developed in recent years. In particular, these new fingerprintswere mostly paired with advanced machine learning algorithms, such as random forest (RF), gradient boost-ing decision tree (GBDT), single-task deep neural networks (ST-DNNs), multi-task deep neural networks(MT-DNNs), convolutional neural network (CNN), recurrent neural network (RNN), etc. methodology, whichare now easily accessible to the scientific community via user-friendly deep learning frameworks in popularprogramming languages. Often, these new methods have demonstrated higher accuracy or better perfor-mance than earlier methods in the literature, which are typically based on 2D fingerprints and/or simple machinelearning algorithms for drug discovery related applications, such as protein-ligand binding, virtual screening, toxicity, solubility, partition coefficient, as well as protein folding stability change upon mutation. Addition-ally, recent results from D3R Grand Challenges, a community-wide annual competition series in computer-aideddrug design, indicate that structure-based methods using sophisticated 3D structure-based fingerprints have anadvantage over ligand-based methods using 2D fingerprints in scoring and free energy predictions.
Thesedevelopments raise an interesting question of whether 2D fingerprints are still valuable for drug design anddiscovery. Therefore, there is pressing need to reassess 2D fingerprints with advanced machine learning algo-rithms and compare their performance with the state-of-the-art 3D structure-based fingerprints for drug discoveryrelated applications.The objective of the present work is to reassess the predictive power of eight popular 2D fingerprints forfour important drug-related problems, namely, toxicity, binding affinity, Log P, and Log S, involving a total of23 datasets. These problems are selected for the availability of reference results generated by the state-of-the-art 3D structure-based fingerprints in the literature. To optimize 2D fingerprints’ performance, advanced machinelearning algorithms, including RF, GBDT, ST-DNN, and MT-DNN, are employed in the present study. Additionally,consensus models are constructed from appropriate combinations of 2D fingerprint-based predictions to further2nhance their performance. The predictive power of each 2D fingerprint for certain functional groups is analyzed.Extensive numerical studies over 23 datasets using eight 2D fingerprints and four different machine learning al-gorithms indicate that the combination of appropriate machine learning algorithms and 2D fingerprint-basedmodels, particularly consensus models, can bring significant improvements over previous 2D QSPR approachesespecially on toxicity predictions. Moreover, 2D fingerprint-based models perform as well as the state-of-the-art 3D structure-based fingerprints in the predictions of toxicity, solubility, partition coefficient and ligand-basedprotein-ligand binding affinity. Finally, topology-based fingerprints extracted from 3D protein-ligand complexeshave a significant advantage over 2D fingerprints in complex-based protein-ligand binding affinity predictions.We believe that the present performance analysis and assessment will provide a useful guideline on how tochoose appropriate fingerprints and machine learning methods for drug discovery related applications.
II MethodsII.A 2D fingerprints
In the present work, we investigate eight popular 2D fingerprints, including FP2 fingerprint, MACCS fingerprint,Daylight fingerprint, Estate1 fingerprint, Estate2 fingerprint, ECFP4 Fingerprint, 2D-pharmacophore (Pharm2D),and extended reduced graph fingerprint (ERG). They are chosen to represent four main 2D molecular fingerprintcategories, namely key-based fingerprints, topological or path-based fingerprints, circular fingerprints, pharma-cophore fingerprints. These features are some of the most popular and commonly used ones. Table 1 sum-marizes the information related to these fingerprints. All 2D fingerprints were generated by Openbabel (version2.4.1) and RDKit (version 2018.09.3). II.B Ensemble methods
Two popular ensemble methods were used in our work. The first method is random forest (RF), which con-structs a multitude of decision trees during a training process. RF can be used to predict a classification label(classification model) or a mean prediction (regression model) of the individual trees. It is very robust againstoverfitting and easy to use. The second method is gradient boosting decision tree (GBDT). In this approach, indi-vidual decision trees are combined in a stage-wise fashion to achieve the capability of learning complex features.It uses both gradient and boosting strategies to reduce model errors. Compared to deep neural network (DNN)approaches, these two ensemble methods are robust, relatively insensitive to hyper parameters, and easy toimplement. Moreover, they are much faster to train than DNN is. In fact, for small datasets, RF and GBDT canperform even better than DNN or other deep learning algorithms. Therefore, these methods have been appliedto a variety of QSAR prediction problems, such as toxicity, solvation, and binding affinity predictions.
II.C Single-task deep neural network (ST-DNN)
A DNN mimics the learning process of a biological brain by constructing a wide and deep architecture ofnumerous connected neuron units. A typical deep neural network often includes multiple hidden layers. In eachlayer, there are hundreds or even thousands of neurons. During learning stages, weights on each layer areupdated by backpropagation. With a complex and deep network, DNN is capable of constructing hierarchicalfeatures and model complex nonlinear relationships.ST-DNN is a regular deep learning algorithm. It only takes care of one single prediction task. Therefore, it onlylearns from one specific training dataset. A typical four-layer ST-DNN is showed in figure 1, where N i (i = 1, ...,4), represents the number of neurons in the i th hidden layer. II.D Multitask deep neural network (MT-DNN)
The multitask (MT) learning technique has achieved much success in qualitative Merck and Tox21 predictionchallenges.
In the MT framework, multiple tasks share the same hidden layers. However, the output layeris attached to different tasks. This framework enables the neural network to learn all the data simultaneously fordifferent tasks. Thus, the commonalities and differences among various datasets can be exploited. It has beenshowed that MT learning typically can improve the prediction accuracy of relatively small datasets if it combineswith relatively larger datasets in its training.Figure 2 is an illustration of a typical four-layer MT-DNN for training four different tasks simultaneously. Supposethere are totally T tasks and the training data for the t th task are ( X ti , y ti ) N t i =1 , where t = 1 , . . . , T , i = 1 , . . . , N t , N t is the number of samples in the t th task, and X ti is the feature vector for the i th sample in the t th task, y ti isthe label value of the i th sample in the t th task, respectively. The purpose of MT learning is to simultaneouslyminimize the loss function: argmin (cid:80) Tt =1 (cid:80) N t i =1 L ( y ti , f t ( X ti , θ t )) where f t is the prediction for the i th sample in the t th task by our MT-DNN, which is a function of the feature3 ingerprint Description Number of features Package FP2 A path-based fingerprint which indexessmall molecule fragments based on lin-ear segments of up to 7 atoms
256 Openbabel Daylight A path-based fingerprint consisting2048 bits and encoding all connectiv-ity pathways in a given length througha molecule MACCS A substructure keys-based fingerprintwith 166 structural keys based onSMARTS patterns which uses an iterative process to as-sign numeric identifiers to each atom X ti , L is the loss function, and θ t is the collection of machine learning hyperparameters. A popular costfunction for regression is the mean squared error, which can be defined as: L ( y ti , f t ( X ti , θ t )) = N t (cid:80) N t i =1 ( y ti − f t ( X ti , θ t )) .In this study, MT learning technology is applied to the toxicity prediction. The ultimate goal of this MT learningis to potentially improve the overall performance of multiple toxicity prediction models, especially for the smallestdataset that performs relatively poorly in the ST-DNN. More concretely, it is reasonable to assume that differenttoxicity indexes share a common pattern so that these different tasks can be trained simultaneously when theirfeature vectors are constructed in the same manner. For our toxicity prediction, four different tasks (LD , IGC ,LC , LC -DM data sets) are trained together. This leads to four output neurons in the output layer (See O to O in Figure 2), with each neuron being specific to one of four tasks. II.E HyperparametersEnsemble hyperparameters.
Both RF and GBDT were implemented by the scikit-learn package (version0.20.1). In this work, there are a total of 23 datasets with their training data size varying from 94 to 8199.RF has been showed to be consistent and robust with various datasets. However, if its parameters are carefully4igure 1: An illustration of a typical ST-DNN. Only one task (data set) is trained in this network. Four hiddenlayers are included, k i (i = 1, 2, 3, 4) represents the number of neurons in the i th hidden layer and N i,j is the j thneuron in the i th hidden layer. Here, O is the single output for the task.Figure 2: An illustration of a typical MT-DNN training four tasks (datasets) simultaneously. Four hidden layersare included in this network, k i (i = 1, 2, 3, 4) represents the number of neurons in the i th hidden layer and N i,j is the j th neuron in the i th hidden layer. Here O to O represent four predictor outputs for four tasks.tuned based on the size of a given training set, GBDT can attain better performance than RF does in mostcases. For all experiments in this work, the most essential parameters of GBDT are chosen as learning rate =0.01, min_samples_split = 3, max_features=sqrt. Detail values of other parameters are given in Table 2. Network hyperparameters.
Since the numbers of features differ much in different 2D fingerprints, differentnetwork architectures have to be adopted. For example, Estate 1 fingerprint has only 79 bits. Therefore a 4-layer network with the number of neurons in various hidden layers are chosen as 500, 1000, 1500, and 500.However, the Daylight fingerprint has as many as 2048 features, and thus a much larger network is needed. Thenetwork for this fingerprint still has 4 layers but there are 3000, 2000, 1000, and 500 neurons in the first, second,third and fourth hidden layer, respectively. Other network parameters are as followed: the optimizer is stochastic5 raining-set size RF parameters GBDT parameters <
800 n_estimators=1000, criterion=‘mse’,max_depth=None,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0 n_estimators=2000, max_depth=9,min_samples_split=3, learn-ing_rate=0.01, subsample=0.1,max_features=’sqrt’800 to 5000 n_estimators=10000,max_depth=7,min_samples_split=3,learning_rate=0.01, subsample=0.3,max_features=’sqrt’5000 to 10000 n_estimators=20000,max_depth=7,min_samples_split=3,learning_rate=0.01, subsample=0.3,max_features=’sqrt’Table 2: RF and GBDT parameters for different training-set sizes.gradient descent (SGD) with momentum of 0.5. 2000 epochs were run for all the networks. Mini-batch size isset to 4. The learning rate is set to 0.01 in the first 1000 epochs and 0.001 for the rest epochs. Our tests indicatethat adding a dropout or using L decay does not necessarily improve the accuracy, and thus, we omit these twotechniques. All the network hyperparameters are summarized in Table 3. These hyperparameters are applied toboth ST-DNN and MT-DNN. All the DNN training is performed by Pytorch (version 1.0). Fingerprint Numberoffeatures Number ofhiddenlayers Number ofneurons in eachhidden layer Optimizer Mini-batch Learningrate
Estate1 79 4 500,1000,1500,500 SGD with amomentumof 0.5 4 First 1000:0.01; Then:0.001Estate2 79Daylight 2048 3000,2000,1000,500Table 3: The network hyperparameters for both ST-DNN and MT-DNN.
III ResultsIII.A Toxicity prediction
Four toxicity datasets were studied in our work, namely oral rat LD (LD ), 40 h Tetrahymena pyriformisIGC (IGC ), 96 h fathead minnow LC (LC ), and 48 h Daphnia magna LC (LC -DM). Among them,LD measures the amount of chemicals that can kill half of rats when orally ingested. IGC records the 50%growth inhibitory concentration of Tetrahymena pyriformis organism after 40 h. LC reports at the concentrationof test chemicals in water in milligrams per liter that cause 50% of fathead minnows to die after 96 h. The lastone is LC -DM, which represents the concentration of test chemicals in water in milligrams per liter that cause50% Daphnia maga to die after 48 h. The unit of toxicity reported in these four datasets is -log mol/L. All ofthem are accessible from the recent publications Data set Total size Train set size Test set size Max value Min value LD
823 659 164 9.261 0.037LC III.A.1 The performance of ensemble methods
Because it is easy to implement and fast to train, two ensemble methods, RF and GBDT, were first tested.Since four datasets have very different sizes, different numbers of estimators in RF and GBDT models should beused. Specifically, for two relatively small sets, LC and LC -DM, the numbers of estimators are set to 2000.For IGC50, 10000 estimators are used. For the largest set LD , we have used 20000 estimators.6he accuracy is in term of the square of Pearson correlation coefficient ( R ). Overall, GBDT’s performance isalways better than that of RF, which agrees with early publication. Among all the eight fingerprints we tested,Estate2, Estate1, Daylight, FP2, ECFP and MACCS usually work well on these four sets. Thus the consensus ofthese six fingerprints was also considered (“Top 6-cons” in Figure 3). The consensus model typically gives riseto a further improvement over all single fingerprints in most cases.Figure 3: The R on LD , IGC , LC , LC -DM test sets yielded by eight fingerprints and the consensuses ofthe top 6 features. Two ensemble methods were adopted (GBDT: blue, RF: red). The values shown in the figureare the R of GBDT. (a)LD test set. LD dataset is the largest set having as many as 7413 compounds. However, a higherexperimental uncertainty of the values in this set makes this set relatively difficult to predict (See "Max value"and "Min value" in Table 4). In our GBDT model, the best single fingerprint (MACCS) yields an R of 0.643, whilethe consensus of the top 6 fingerprints increases R to 0.679. (b) IGC test set. IGC set is the second largest set (1792 compounds) among the four sets we investigated.As indicated in Table 4, the diversity of molecules of in this set is the lowest among the four sets, which meansthe prediction should be some easier. Our results show that Estate2 is the best single fingerprint with an R of0.742, and the consensus of the top 6 fingerprints leads to an R of 0.785. (c) LC test set. LC set is a relative smaller set (823 compounds). By employing GBDT model, estate2fingerprint achieves the top performance, which yields an R of 0.662. The consensus of the top 6 fingerprintimproves the R to 0.715. 7 d) LC -DM test set. Among the four sets, LC -DM test set is the smallest one with only 283 training moleculesand 70 test molecules, which is troublesome to build a robust model. Therefore, our model just generatesmoderate results. Specifically, the best single fingerprint Estate1 only has an R of 0.520. The consensus modeleven ruins the R a little bit with an R as low as 0.486. Similar difficulty is also faced by other recent work, suchas the R of the 3D-topology based GBDT model only reaches 0.505. Thus, there is a need for multitask deeplearning when dealing with such a small dataset.
III.A.2 The performance of single-task and multitask deep learning
On average, Estate2, Estate1, and Daylight are the top three fingerprints when using GBDT models in all thefour sets. Thus, these three fingerprints were picked up to perform higher-level ST-DNN and MT-DNN.Since the lengths of the three fingerprints differ much, different DNN architectures are needed. Four hiddenlayers with 500, 1000, 1500, and 500 neurons are used for Estate1 and Estate2, whose fingerprints have 79features. Four hidden layers with 3000, 2000, 1000, and 500 neurons are used for Daylight, whose fingerprinthas 2048 bits.The pattern of ST-DNN results is similar to that of GBDT results. On four data sets, a ST-DNN consensus modelyields an average R of 0.658 (0.632, 0.791, 0.687, and 0.523 respectively). As a comparison, the average R by a GBDT consensus model is 0.666 (0.679, 0.785, 0.715, and 0.486 respectively). However, the performancecan be largely enhanced by the multitask strategy because the two relatively smaller sets LC and LC -DMcan benefit much from two larger sets LD and IGC . As shown in Table 5, while the MT-DNN model seldomchanges the performance on LD and IGC , it gives rise to a dramatic improvement on LC and LC -DM,especially on LC -DM. The consensus lifts the R result from 0.523 to 0.725. Method R of LD R of IGC R of LC R of LC -DM Estate2 ST-DNN 0.484 0.715 0.569 0.433Estate2 MT-DNN 0.489 0.696 0.660 0.623Estate1 ST-DNN 0.569 0.733 0.650 0.601Estate1 MT-DNN 0.566 0.735 0.694 0.684Daylight ST-DNN 0.619 0.701 0.570 0.346Daylight MT-DNN 0.617 0.717 0.724 0.694Consensus ST-DNN 0.632 0.791 0.687 0.523Consensus MT-DNN 0.639 0.794 0.765 0.725Table 5: The R of ST-DNN and MT-DNN based on the top 3 fingerprints in GBDT (Estate2, Estate1, Daylight)and their consensuses. III.A.3 Systematic comparison with other toxicity predictions
A systematic comparison with other methods was provided in Table 6. The same datasets are also usedto develop the Toxicity Estimation Software Tool (T.E.S.T). So many related results can be found in its user’sguide, including hierarchical, single model, FDA, group contribution, nearest neighbor, and T.E.S.T consensus.Since T.E.S.T is also based on 2D descriptors, the comparison between the results from the present models andT.E.S.T can largely reflect the predictive power of the present models. As shown in Table 6, on the LD , IGC ,LC sets, the present MT-DNN consensus always leads to a higher R than T.E.S.T consensus. Especially,on the IGC and LC sets, the present MT-DNN consensus models largely beat T.E.S.T (0.794 vs 0.764 and0.765 vs 0.728), and the present GBDT results quite outperform T.E.S.T (0.679 vs 0.626) on the LD set. Evenon the LC50-DM set, because the training set is so small (283), ensemble methods (RF and GBDT) and DNNmethods are not suitable for it: R of ST-DNN and GBDT are, respectively, 0.486 and 0.523. However, the R ofMT-DNN is as high as 0.725 for LC -DM dataset, which is quite comparable to the T.E.S.T result with an R of0.739.2D MT-DNN consensus has an average R of 0.731 for these four datasets, while the average of T.E.S.T modelis 0.714, and the recent 3D structure-based topological MT-DNN consensus result is also 0.731. These resultsconfirm that 2D fingerprints integrated with MT-DNN model surpass the previous 2D models and are as good asthe recent 3D structure-based topological model. III.B Aqueous solubility (Log S)
For Log S, following the previous literature, we test Klopman’s test set with the original train set. The unitof Log P in these sets is log unit. Since the size of the training set is 1290, 10000 estimators were used in theGBDT model. 8 D Method R RMSE Coverage
The present 2D MT-DNN consensus 0.639 0.549 1.000The present 2D GBDT consensus 0.679 0.580 1.000
Hierarchical IGC Method R RMSE Coverage
The present 2D MT-DNN consensus 0.794 0.457 1.000The present 2D GBDT consensus 0.785 0.457 1.000
Hierarchical LC Method R RMSE Coverage
The present 2D MT-DNN consensus 0.765 0.718 1.000The present 2D GBDT consensus 0.715 0.783 1.000
Hierarchical LC -DM Method R RMSE Coverage
The present 2D MT-DNN consensus 0.725 0.935 1.000The present 2D GBDT consensus 0.486 1.239 1.000
Hierarchical Training set Klopman’s test set R and RMSE of 0.944 and 0.684, respectively. The consensus of top 3 is even better, which improves R and RMSE to 0.955 and 0.648 (See Table 8). A systematic comparisons to other methods are included in9 ingerprint R RMSECons-top 3 0.955 0.648Cons-top 6 0.944 0.684
MACCS 0.958 0.664Estate1 0.932 0.791Daylight 0.923 0.780FP2 0.908 0.853ECFP 0.904 0.875Estate2 0.897 0.907Pharm2D 0.832 1.114ERG 0.811 1.202Table 8: The R and RMSE of predicting Log S by eight fingerprints and the consensuses of the top 3 and top 6on Klopman’s test set. Method R RMSECons-top 3 0.955 0.648Cons-top 6 0.944 0.684
MT-ESTD + -1 (3D) III.C Partition coefficient (Log P)
Three Log P data sets were tested using the GBDT model. The training set has 8199 molecules, which wasoriginally compiled by Cheng et al. There are three test sets, namely FDA, Star, and Non-star respec-tively, which are given in Table 10. The Log P in these sets is by the unit of log mol/L. Due to the size of thetraining set, 20000 estimators are used in the GBDT model. Test setTraining set FDA Star Non-star R oracceptable rate. The acceptable rate here is defined as the percentage of molecules within error range < 0.5. Of all the three sets, the 2D fingerprints of Estate2, Estate1, MACCS, and ECFP are always the top 4. Theconsensuses of the top 4 fingerprints produce R up to 0.901 on the FDA set and attain an acceptable rate onStar set at 71.3%. On the Non-star set, the top 4 consensus is somehow worse than the best single fingerprintEstate1 but it is still in the second place with an acceptable rate of 46.5% (See Figure 4).A detailed comparison with other Log P prediction methods was shown in Table 11. On the FDA data set,GBDT-ESTD + -2-AD and MT-ESTD-1 are based on 3D descriptors. GBDT-ESTD + -2-AD model includes somemolecules from the NIH-dataset in its training set. Therefore, its performance is slightly better than the presentone. The 2D method ALOGPS also performs slightly better (0.908 vs 0.901) than the present one. However,a previous study has pointed out that for the PHYSPROP database, the training set of ALOGPS actuallycontains all of the compounds in the FDA set. It is unclear how well it will perform if the overlapping compoundsare removed from the training set. Unlike ALOGPS, XLOGP3’s training data is completely independent of thetest set. In this case, the present prediction is more accurate than that of XLOGP3 (0.901 vs 0.872).The present results on the Star and Non-star sets are also systematically compared with other stat-of-the-art models as shown in Table 12. For the Star set, we achieve 71% of total number of molecules having thepredicted error less than 0.5 (acceptable rate 71%). This result is quite satisfactory and is comparable to the3D structure-based model developed by Wu et al. with an acceptable rate of 72% on the same training set(“MT-ESTD-1" in Table 12). There are many commercial software packages developed to predict Log P suchas AB/Log P, S/Log P, ACD/log P, etc. However, we cannot validate whether the training sets used inthese software packages overlap with the Star set. It is more meaningful when comparing the present model to10igure 4: The performance of eight fingerprints and the consensuses of the top 4 on the FDA, Star and Non-stardata sets of Log P. To be consistent with previous results, on the FDA set, R is given, while on star and non-stardatasets, acceptable rate is given. Method R RMSE
GBDT-ESTD + -2-AD (2D+3D) Our Cons-top 4 (2D) 0.901 0.63
XLOGP3 (2D) since its training dataset does not contain any molecules in the test set. Again, the presentmodel outperforms XLogP3 package on the Star set with the acceptable rates being 71% and 60%, respectively.In the Non-star set, all of the published methods perform as accurate as those in the FDA and Star data set,since the structures in the Non-star set are relatively new and complex. Thus, our model also only achieves anacceptable rate of 47%. However, it is still tied for the third place among all predictors. This result is even betterthan some 3D structure-based models, though RMSE is relatively high due to a few large outliers.11 tar set (N=223) Non-star set (N=43) % of Moleculeswithin error range % of Moleculeswithin error rangeMethod < < < <
84 12 0.41 42 23 1.00MT-ESTD + -1-AD
77 16 0.49 49 19 0.98S+logP
76 22 0.45 40 35 0.87ACD/logP
75 17 0.50 44 32 1.00CLOGP
74 20 0.52 47 28 0.91MT-ESTD-1
72 18 0.55 33 28 1.01ALOGPS
71 23 0.53 42 30 0.82
Our cons-top 4 71 18 0.625 47 16 1.233
MiLogP
69 22 0.57 49 30 0.86KowWIN
68 21 0.64 40 30 1.05TLOGP
67 16 0.74 30 37 1.12CSLogP
66 22 0.65 58 19 0.93SLIPPER-2002
62 22 0.80 35 23 1.23XLOGP3
60 30 0.62 47 23 0.89XLOGP2
57 22 0.87 35 23 1.16QLOGP
48 26 0.96 21 26 1.42VEGA
47 27 1.04 28 30 1.24SPARC
45 22 1.36 28 21 1.70LSER
44 26 1.07 35 16 1.26CLIP
41 25 1.05 33 9 1.54MLOGP(Sim+)
38 30 1.26 26 28 1.56HINTLOGP
34 22 1.80 30 5 2.72NC+NHET
29 26 1.35 19 16 1.71Table 12: Comparison of Log P predictions of the Star and Nonstar sets.
III.D Protein-ligand binding affinity predictionIII.D.1 The S1322 dataset
To assess the predictive power of 2D-fingerprint based models, two protein-ligand binding affinity datasetswere investigated. The first one is denoted as the S1322 set. It is a high quality data set with 1322 protein-ligand complexes involving 7 protein clusters (labeled as CL1, CL2, · · · , CL7). It is a subset of the refined setof PDBbind v2015. The other dataset is PDBbind v2016, in which the refined set excluding the core set inPDBbind v2016 is used as a training data. The core set is a test set. These two sets are summarized in Table 13. S1322 set PDBBind v2016 refined set
CL1 CL2 CL3 CL4 CL5 CL6 CL7 refined set training set core set (test set)333 264 219 156 134 122 94 4057 3767 290Table 13: The quantitative summary of the S1322 and PDBbind v2016 data sets.The ligand-based model is used in the present work. For the S1322 set, a 5-fold cross validation was conductedwith the GBDT method. To be consistent with the results in the previous literature, accuracy is in term of Pearsoncorrelation coefficient ( R ). Because the results from Daylight and Pharm2D fingerprints are relatively poor, theirresults are omitted here. The performance of the other six fingerprints (ECFP, FP2, Estate2, MACCS, Estate1,ERG) and their consensus are shown in Figure 5.Figure 5 indicates that for all the seven clusters, the consensuses of the six fingerprints largely achieve betterperformance than that of any single fingerprint. Specifically, the R values of consensus models are 0.717, 0.847,0.708, 0.718, 0.831, 0.777, and 0.760 on each of 7 clusters, respectively and 0.765 on average. These resultsare comparable to ones achieved by a ligand-based 3D topology and GBDT model. III.D.2 PDBbind v2016 refined set and core set
The present ligand-based model was also tested on PDBbind v2016. Rather than cross validation, this time thecore set is regarded as a test set. Quite consistent with core validation on the S1322 set, the consensus of thesix fingerprints leads to a large improvement than any single one, with an R of 0.747. These results indicate that12igure 5: Pearson correlation coefficient ( R ) on the seven clusters of the S1322 data set yielded by the sixfingerprints (ECFP, FP2, Estate 2, MACCS, Estate 1, ERG) and their consensuses.13igure 6: The R on the PDBbind v2016 binding affinity set yielded by the six fingerprints (ECFP, FP2, Estate 2,MACCS, Estate 1, ERG) and their consensus.the present model has a stable and reliable performance on different protein-ligand binding affinity data sets. Method R RMSE (kcal/mol)
TopBP (Complex) et. al. reports 2D fingerprint-based complex models. In their work, a recently developed 2D fingerprint model is used to encode protein-ligandcomplex information. When combined with DNN, their method gives rise to an R of 0.817 on the PDBBind v2016core set. A complicated 3D structure-based model using the topology of the protein-ligand complex developedby our group has an R of 0.861 on the same set. Table 14 lists these results. IV DiscussionIV.A General analysis
In the present work, the predictive power of eight popular 2D fingerprints as well as their consensuses on fourimportant drug-related properties (i.e., toxicity, Log S, Log P, binding affinity) was investigated. The presentstudy reveals that with a proper machine learning algorithm, the 2D fingerprint-based models including theirconsensuses outperform other 2D QSPR approaches in the most cases, especially on the toxicity predictions.Additionally, 2D fingerprint-based models are comparable to state-of-the-art 3D structure-based models in mostdrug-related property predictions, except for protein-ligand binding affinity prediction. Considering 2D fingerprintsare very "cheap" molecular descriptors that are easy and fast to generate, our results are very impressive. Itmeans that 2D fingerprints with appropriate machine learning algorithms are still very valuable for practicalproblems, such as the prediction of toxicity, the aqueous solubility (Log S), and the partition coefficient (Log P).However, for protein-ligand binding affinity prediction, complex-based models using 3D topological fingerprintshave a major advantage over the present 2D fingerprints, i.e., about 15% more accurate. IV.B The performance analysis of 2D fingerprintsIV.B.1 Analysis of 2D fingerprints for PDBbind v2016 core set predictions
The performance of each 2D fingerprint can be systematically analyzed by comparing the difference betweenprediction errors of every pair of fingerprints as follows.(1) The relative absolute error for the f th fingerprint on the i th sample (molecule) in the test set is defined by Error f,i = | prediction value f,i − experimental value i || experimental value i | Ranking FP2 Estate1 Estate2 MACCS
15 carbonyl group: 24/41 carbonyl group: 25/42 carbonyl group: 23/41 bicyclic compounds:17/362 unfused benzene ring:21/41 unfused benzene ring:18/42 unfused benzene ring:22/41 pyridine: 17/363 bicyclic compounds:19/41 aniline:14/42 carboxylate ion: 16/41 ether: 16/364 hydroxyl: 16/41 carboxylate ion: 14/42 bicyclic compounds:15/41 carbonyl group 15/3616 ether: 14/41 hydroxyl: 14/42 carbonyl group with N:13/41 hydroxyl: 15/366 12/41 ether: 14/42 ether: 13/41 unfused benzene ring:12/367 amide: 10/41 carbonyl withNitrogen: 13/42 hydroxyl: 11/41 amide: 10/368 azole: 8/41 amide: 11/42 amide: 11/41 carboxylate ion: 9/369 ......multiple non-fusedbenzene rings: 7/41 11/41 aniline: 10/41 azole: 7/36170 ......aniline: 7/41 phenol: 8/42 ......multiple non-fusedbenzene rings 8/41 furan: 5/36Table 15: The top 10 frequently occurred functional groups in PDBbind v2016 core set for each fingerprint. Foreach fingerprint, the occurrence frequency and the total number of molecules are also given.This analysis is quite significant as shown in Table 15. It indicates that different fingerprints have different per-formance on certain functional groups: some fingerprints perform better on some functional groups, while otherfingerprints perform better on other functional groups. Therefore, one can select an appropriate fingerprint torepresent a certain class of functional groups based on Table 15. For the FP2, Estate1, and Estate2 fingerprints,the top two functional groups are carbonyl groups and unfused benzene rings. However, the MACCS fingerprintis some special. Its top two functional groups are bicyclic compounds and pyridine.The third top functional groups differ much for four fingerprints: bicyclic compounds for FP2, aniline for Estate1,carboxylate ion for Estate2, and ether for MACCS, which gives us more information to choose fingerprints. Suchas, if one has a molecule including aniline, then Estate1 should be selected. Noticeably, some functional groupsoccur exclusively for one or two types of fingerprints. For example, F, Cl, Br, I is only on the lists of FP2 andEstate1. While azole appears only on the list of FP2 and MACCS and multiple non-fused benzene rings are onlyfor FP2 and Estate 2. Moreover, phenol occurs only for Estate1 and furan occurs only for MACCS.
IV.B.2 Analysis of 2D fingerprints for the IGC toxicity data set prediction and also other data sets Using the same 5-step procedure outlined above, we carry out a performance analysis for toxicity dataset IGC ,which is shown in Figure 8 and Table 16. The molecules in the toxicity data set are typically small and simple,leading to the functional groups in Table 16 also small. Moreover, since there are not too many functional groupsin these relatively simple molecules, only top 8 functional groups are presented in the table. Similar to theperformance on the binding affinity, for the top 4 fingerprints on the toxicity set, the carbonyl group is in the firstplace. Unfused benzene rings also have a high occurrence frequency, resulting in the second or third ranking.The difference between the performance of various fingerprints is mainly located on sulfide and aliphatic chainswith 8 or more members. FP2 fingerprint works well on sulfide, whereas, Daylight, Estate1 and Estate2 workwell on aliphatic chains with 8 or more members. Ranking FP2 Daylight Estate1 Estate2 toxicity set for each fingerprint. For eachfingerprint, the occurrence frequency and the total number of molecules are also given.19igure 8: The ranked error differences between pairs of fingerprints for IGC toxicity set of 358 molecules. Onlythe top 4 fingerprints (i.e., Estate2, FP2, Estate1, Daylight) are considered.The same performance analyses were also conducted for other toxicity and log P data sets, the results areshown in Tables S1 to S4. These tables indicate, for the toxicity data sets of LD , LC , LC -DM, the perfor-mance of the Estate1 and Estate2 fingerprints are similar, they both work well on bicycle compounds; comparingto it, the FP2 fingerprint works better on aliphatic chains with 8 or more members, the daylight fingerprint has abetter performance on amide. For log P data set, the ECFP and Estate2 fingerprints lead to a good performanceon aniline, the Estate 1 fingerprints works better on bicycle compounds; MACCS fingerprint works better onunfused benzene ring. IV.C The predictive power of the consensus of 2D fingerprints
The consensus of several different fingerprints typically further enhances the performance of a single fingerprint.This enhancement can be quite significant. However, on the datasets of different drug-related properties, thebest fingerprint combinations for the consensus are not consistent. One possible explanation is that differentfingerprints are good at encoding certain functional groups, and datasets for different drug-related propertieshave different functional group distributions. This is also the reason why a consensus can enhance performance.The consensus can capture more functional groups and counter-balance the systematical bias from differentfingerprints.On toxicity prediction, the best combination for consensus is obtained with Estate2, Estate1, Daylight, FP2,ECFP, and MACCS. On the Log S prediction, the best combination is achieved with MACCS, Estate1, andDaylight. While on the Log P prediction, the best consensus involves Estate2, Estate1, ECFP, and MACCS.Finally, on the binding affinity prediction, the best consensus uses Estate2, Estate1, FP2, ECFP, MACCS, andERG. It is worth noting that, Estate related (Estate1, Estate2 or both) models are always included in the bestcombinations. In fact, their single performances are relatively good. This finding is not surprising since Estatefingerprints encode the intrinsic electronic state of the atom as perturbed by the electronic influence of all otheratoms. It is well-known that electronic state is important to drug-related properties.20
V.D Multitask deep learning
Multitask deep learning was utilized on our toxicity prediction. It turns out that the smallest set LC -DM withonly 283 training samples benefits dramatically from the multitask deep learning strategy. Its R value rises from0.523 to 0.725. This is because, in the frame of multitask deep learning, different data sets (tasks) share similarstructure-function relationships. When a small dataset is trained with a large dataset through shared neuralnetworks, the statistics learned from the large datasets in the shared neurons can help predict the small datasetproperty. As a result, the other three large toxicity sets can share their patterns learned from training with thesmall toxicity set, enhancing its prediction. Therefore, multitask deep learning could be a useful strategy to trainrelatively small datasets. IV.E The limitation and advantage of 2D fingerprints
Typically, 2D fingerprints only encode small molecules, such as ligands, although high level 2D fingerprint mod-els including both proteins and ligands have also been developed.
Theoretically, 2D fingerprints are moresuitable for target-independent or target-unspecific problems involving small molecules, such as toxicity, solva-tion free energy, aqueous solubility, partition coefficient, permeability, etc. The current investigation confirms thispoint. For toxicity, aqueous solubility and partition coefficient, the present 2D-fingerprint based methods performquite similar to or even somewhat better than 3D structure-based methods in some cases.For protein-ligand binding affinity predictions, both ligand-based approaches and complex-based are exam-ined. For ligand-based approaches, 2D-fingerprint based methods can perform as well as 3D structure-basedmodels. However, 3D structure-based topological models outperform 2D-fingerprint based methods (i.e., R:0.861 vs 0.747 for PDBbind v2016 core test). In fact, more sophisticated 2D fingerprint models that utilize theprotein-ligand complex information and DNN are still not as accurate as 3D topology-based models (i.e.,R: 0.817 vs 0.861 for PDBbind v2016 core test and 0.774 vs 0.808 for PDBbind v2013 core test). Essentially,algebraic topology is designed to simplify the geometric complexity of biological macromolecules. Therefore, itis able to extract vital information from protein-ligand complexes to predict their binding affinities.2D fingerprints are much easier to generate than 3D structure-based fingerprints built form algebraic topology,differential geometry or various graph theory. Therefore, 2D-fingerprint based models can be useful tools forpreliminary drug screening studies. V Conclusion
Two-dimensional molecular fingerprints, or 2D fingerprints, refer to molecular structural patterns, such as ele-mental composition, atomic connectivity, functional groups, 2D-pharmacophores etc. extracted from a moleculewithout taking into account the 3D-structural representation of these properties. 2D fingerprints have been a mainworkhorse for cheminformatics and bioformatics for decades. However, their validations in various datasets weretypically carried out long time ago with earlier machine learning algorithms. Recently, new 3D structure-basedmolecular fingerprints built from algebraic topology, differential geometry, geometric graph theory, andalgebraic graph theory have found much success in drug discovery related applications, including D3RGrand Challenges. It raises an interesting issue whether 2D fingerprints are still competitive in drug discov-ery related applications.This work reassesses 2D fingerprints for their performance in drug discovery related applications. We con-sider a total of eight commonly used 2D fingerprints, namely FP2, Daylight, MACCS, Estate1, Estate2, ECFP,Pharm2D, and ERG. Four types of drug discovery related applications with 23 datasets, including solubility (LogS) and partition coefficient (Log P) that are independent of a target protein, toxicity that may depend on certainunknown target proteins, and protein-ligand binding affinity that depend on known target proteins, are designed tovalidate 2D fingerprints. Advanced machine learning algorithms, including random forest (RF), gradient boostingdecision trees (GBDT), single-task deep neural network (ST-DNN), and multitask deep neural network (MT-DNN)are used to optimize the performance of the above 2D fingerprints in the aforementioned four types of datasets.In particular, MT-DNN is designed to enhance the performance of 2D fingerprints on relatively small datasetsby a simultaneous training with relatively large datasets that share a similar pattern. Since each fingerprint mayhave an explicit bias on certain functional groups or 2D patterns, we carry out various consensus to furtherboost the performance of 2D fingerprints in all the datasets. Finally, the strengths of top four 2D fingerprints forpredicting protein-ligand binding affinity and quantitative toxicity are analyzed in detail.Our general findings are as follows. 1) 2D fingerprint-based models are as good as 3D structure-based mod-els for various toxicity, Log S and Log P datasets under the same training-test condition. 2) For ligand-basedprotein-ligand binding affinity predictions, 2D fingerprint-based models perform equally well as 3D structure-based models that are based only on ligand 3D structures. 3) 3D structure-based models that utilize 3D protein-21igand complex information outperform 2D fingerprints that based on either ligand information or protein-ligandcomplex information. 4) Advanced machine learning algorithms, such as DNN and MT-DNN, are crucial for 2Dfingerprints to achieve optimal performance. 5) There is no 2D fingerprint that outperforms all other 2D finger-prints in all applications. 6) Appropriate consensus of a few 2D models typically achieves better performance.Therefore, if combined with advanced machine learning algorithms, the 2D fingerprints are still competitive inmost drug discovery related applications except for those that involve protein structures.
Acknowledgments
This work was supported in part by NSF Grants DMS-1721024, DMS-1761320, and IIS1900473 and NIH grantGM126189. 22 eferences [1] Li Di and Edward H Kerns.
Drug-like properties: concepts, structure design and methods from ADME totoxicity optimization . Academic press, 2015.[2] Niel M Henriksen, Andrew T Fenley, and Michael K Gilson. Computational calorimetry: high-precision cal-culation of host–guest binding thermodynamics.
Journal of chemical theory and computation , 11(9):4377–4394, 2015.[3] Kaifu Gao, Jian Yin, Niel M Henriksen, Andrew T Fenley, and Michael K Gilson. Binding enthalpy calcu-lations for a neutral host–guest pair yield widely divergent salt effects across water models.
Journal ofchemical theory and computation , 11(10):4555–4564, 2015.[4] Kedi Wu and Guo-Wei Wei. Quantitative toxicity prediction using topology based multitask deep neuralnetworks.
Journal of chemical information and modeling , 58(2):520–531, 2018.[5] Kedi Wu, Zhixiong Zhao, Renxiao Wang, and Guo-Wei Wei. Topp–s: Persistent homology-based multi-taskdeep neural networks for simultaneous predictions of partition coefficient and aqueous solubility.
Journal ofcomputational chemistry , 39(20):1444–1454, 2018.[6] Christopher A Lipinski, Franco Lombardo, Beryl W Dominy, and Paul J Feeney. Experimental and com-putational approaches to estimate solubility and permeability in drug discovery and development settings.
Advanced drug delivery reviews , 23(1-3):3–25, 1997.[7] Li Di and Edward H Kerns. Biological assay challenges from compound solubility: strategies for bioassayoptimization.
Drug discovery today , 11(9-10):446–451, 2006.[8] Andrew L Hopkins, György M Keserü, Paul D Leeson, David C Rees, and Charles H Reynolds. The role ofligand efficiency metrics in drug discovery.
Nature reviews Drug discovery , 13(2):105, 2014.[9] Pascale Atallah, Kenneth B Wagener, and Michael D Schulz. Admet: the future revealed.
Macromolecules ,46(12):4735–4741, 2013.[10] Han Van De Waterbeemd and Eric Gifford. Admet in silico modelling: towards prediction paradise?
Naturereviews Drug discovery , 2(3):192, 2003.[11] Kyaw-Zeyar Myint, Lirong Wang, Qin Tong, and Xiang-Qun Xie. Molecular fingerprint-based artificial neuralnetworks qsar for ligand biological activity predictions.
Molecular pharmaceutics , 9(10):2912–2923, 2012.[12] Hanna Geppert, Martin Vogt, and Jurgen Bajorath. Current trends in ligand-based virtual screening: molec-ular representations, data mining methods, new application areas, and performance evaluation.
Journal ofchemical information and modeling , 50(2):205–216, 2010.[13] Kunal Roy and Indrani Mitra. Electrotopological state atom (e-state) index in drug design, qsar, propertyprediction and toxicity assessment.
Current computer-aided drug design , 8(2):135–158, 2012.[14] Mahmud Tareq Hassan Khan. Predictions of the admet properties of candidate drug molecules utilizingdifferent qsar/qspr modelling approaches.
Current drug metabolism , 11(4):285–295, 2010.[15] David Rogers and Mathew Hahn. Extended-connectivity fingerprints.
Journal of chemical information andmodeling , 50(5):742–754, 2010.[16] Yu-Chen Lo, Stefano E Rensi, Wen Torng, and Russ B Altman. Machine learning in chemoinformatics anddrug discovery.
Drug discovery today , 2018.[17] Adrià Cereto-Massagué, María José Ojeda, Cristina Valls, Miquel Mulero, Santiago Garcia-Vallvé, andGerard Pujadas. Molecular fingerprint similarity search in virtual screening.
Methods , 71:58–63, 2015.[18] Jitender Verma, Vijay M Khedkar, and Evans C Coutinho. 3d-qsar in drug design-a review.
Current topicsin medicinal chemistry , 10(1):95–115, 2010. 2319] Joseph L Durant, Burton A Leland, Douglas R Henry, and James G Nourse. Reoptimization of mdl keys foruse in drug discovery.
Journal of chemical information and computer sciences , 42(6):1273–1280, 2002.[20] Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchi-son. Open babel: An open chemical toolbox.
Journal of cheminformatics , 3(1):33, 2011.[21] Inc. Daylight Chemical Information Systems. Daylight.[22] Lowell H Hall and Lemont B Kier. Electrotopological state indices for atom types: a novel combinationof electronic, topological, and valence state information.
Journal of Chemical Information and ComputerSciences , 35(6):1039–1045, 1995.[23] Greg Landrum et al. Rdkit: Open-source cheminformatics, 2006.[24] Nikolaus Stiefl, Ian A Watson, Knut Baumann, and Andrea Zaliani. Erg: 2d pharmacophore descriptions forscaffold hopping.
Journal of chemical information and modeling , 46(1):208–220, 2006.[25] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. In
Ad-vances in neural information processing systems , pages 2224–2232, 2015.[26] Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez,Timothy Hopper, Brian Kelley, Miriam Mathea, et al. Are learned molecular representations ready for primetime? arXiv preprint arXiv:1904.01561 , 2019.[27] Zixuan Cang and Guo-Wei Wei. Integration of element specific persistent homology and machine learn-ing for protein-ligand binding affinity prediction.
International journal for numerical methods in biomedicalengineering , 34(2):e2914, 2018.[28] Zixuan Cang, Lin Mu, and Guo-Wei Wei. Representability of algebraic topology for biomolecules in machinelearning based scoring and virtual screening.
PLoS computational biology , 14(1):e1005929, 2018.[29] Duc Duy Nguyen and Guo-Wei Wei. Dg-gl: Differential geometry-based geometric learning of moleculardatasets.
International journal for numerical methods in biomedical engineering , 35(3):e3179, 2019.[30] Duc D Nguyen, Tian Xiao, Menglun Wang, and Guo-Wei Wei. Rigidity strengthening: A mechanism forprotein–ligand binding.
Journal of chemical information and modeling , 57(7):1715–1721, 2017.[31] David Bramer and Guo-Wei Wei. Multiscale weighted colored graphs for protein flexibility and rigidity anal-ysis.
The Journal of chemical physics , 148(5):054103, 2018.[32] Duc Duy Nguyen, Zixuan Cang, Kedi Wu, Menglun Wang, Yin Cao, and Guo-Wei Wei. Mathematical deeplearning for pose and binding affinity prediction and ranking in d3r grand challenges.
Journal of computer-aided molecular design , 33(1):71–82, 2019.[33] Vladimir Svetnik, Andy Liaw, Christopher Tong, J Christopher Culberson, Robert P Sheridan, and Bradley PFeuston. Random forest: a classification and regression tool for compound classification and qsar modeling.
Journal of chemical information and computer sciences , 43(6):1947–1958, 2003.[34] Robert E Schapire. The boosting approach to machine learning: An overview. In
Nonlinear estimation andclassification , pages 149–171. Springer, 2003.[35] Imad A Basheer and Maha Hajmeer. Artificial neural networks: fundamentals, computing, design, andapplication.
Journal of microbiological methods , 43(1):3–31, 2000.[36] Rich Caruana. Multitask learning.
Machine learning , 28(1):41–75, 1997.2437] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado,Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, GeoffreyIrving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg,Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens,Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, FernandaViégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available fromtensorflow.org.[38] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, ZemingLin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In
NIPS AutodiffWorkshop , 2017.[39] Zixuan Cang and Guo-Wei Wei. Analysis and prediction of protein folding energy changes upon mutationby element specific persistent homology.
Bioinformatics , 33(22):3549–3557, 2017.[40] Zied Gaieb, Conor D Parks, Michael Chiu, Huanwang Yang, Chenghua Shao, W Patrick Walters, Millard HLambert, Neysa Nevins, Scott D Bembenek, Michael K Ameriks, et al. D3r grand challenge 3: blind predic-tion of protein–ligand poses and affinity rankings.
Journal of computer-aided molecular design , 33(1):1–18,2019.[41] T Martin. User’s guide for test (version 4.2)(toxicity estimation software tool): A program to estimate toxicityfrom molecular structure, 2016.[42] HL Morgan. The generation of a unique machine description for chemical structures-a technique developedat chemical abstracts service.
Journal of Chemical Documentation , 5(2):107–113, 1965.[43] Bao Wang, Chengzhang Wang, Kedi Wu, and Guo-Wei Wei. Breaking the polar-nonpolar division in solva-tion free energy prediction.
Journal of computational chemistry , 39(4):217–233, 2018.[44] Bao Wang, Zhixiong Zhao, Duc D Nguyen, and Guo-Wei Wei. Feature functional theory–binding predictor(fft–bp) for the blind prediction of binding free energies.
Theoretical Chemistry Accounts , 136(4):55, 2017.[45] Stephen J Capuzzi, Regina Politi, Olexandr Isayev, Sherif Farag, and Alexander Tropsha. Qsar modeling oftox21 challenge stress response and nuclear receptor signaling toxicity assays.
Frontiers in EnvironmentalScience , 4:3, 2016.[46] Bharath Ramsundar, Bowen Liu, Zhenqin Wu, Andreas Verras, Matthew Tudor, Robert P Sheridan, andVijay Pande. Is multitask deep learning practical for pharma?
Journal of chemical information and modeling ,57(8):2068–2076, 2017.[47] Jan Wenzel, Hans Matter, and Friedemann Schmidt. Predictive multitask deep neural network models foradme-tox properties: Learning from large data sets.
Journal of chemical information and modeling , 2019.[48] Zhuyifan Ye, Yilong Yang, Xiaoshan Li, Dongsheng Cao, and Defang Ouyang. An integrated transfer learn-ing and multitask learning approach for pharmacokinetic parameter prediction.
Molecular pharmaceutics ,16(2):533–541, 2018.[49] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel,Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning inpython.
Journal of machine learning research , 12(Oct):2825–2830, 2011.[50] Adam Paszke, Sam Gross, Soumith Chintala, and Gregory Chanan. Pytorch: Tensors and dynamic neuralnetworks in python with strong gpu acceleration.
PyTorch: Tensors and dynamic neural networks in Pythonwith strong GPU acceleration , 6, 2017.[51] Kevin S Akers, Glendon D Sinks, and T Wayne Schultz. Structure–toxicity relationships for selected halo-genated aliphatic chemicals.
Environmental toxicology and pharmacology , 7(1):33–39, 1999.2552] Hao Zhu, Alexander Tropsha, Denis Fourches, Alexandre Varnek, Ester Papa, Paola Gramatica, TomasOberg, Phuong Dao, Artem Cherkasov, and Igor V Tetko. Combinatorial qsar modeling of chemical toxicantstested against tetrahymena pyriformis.
Journal of chemical information and modeling , 48(4):766–784, 2008.[53] TJ Hou, Ke Xia, Wei Zhang, and XJ Xu. Adme evaluation in drug discovery. 4. prediction of aqueoussolubility based on atom contribution approach.
Journal of chemical information and computer sciences ,44(1):266–275, 2004.[54] Gilles Klopman, Shaomeng Wang, and Donald M Balthasar. Estimation of aqueous solubility of organicmolecules by the group contribution approach. application to the study of biodegradation.
Journal of chem-ical information and computer sciences , 32(5):474–482, 1992.[55] Tiejun Cheng, Yuan Zhao, Xun Li, Fu Lin, Yong Xu, Xinglong Zhang, Yan Li, Renxiao Wang, and Luhua Lai.Computation of octanol- water partition coefficients by guiding an additive model with knowledge.
Journalof chemical information and modeling , 47(6):2140–2148, 2007.[56] Alex Avdeef.
Absorption and drug development: solubility, permeability, and charge state . John Wiley &Sons, 2012.[57] Raimund Mannhold, Gennadiy I Poda, Claude Ostermann, and Igor V Tetko. Calculation of molecularlipophilicity: State-of-the-art and comparison of log p methods on more than 96,000 compounds.
Journalof pharmaceutical sciences , 98(3):861–893, 2009.[58] P Howard and W Meylan. Physical/chemical property database (physprop), 1999.[59] Zhihai Liu, Yan Li, Li Han, Jie Li, Jie Liu, Zhixiong Zhao, Wei Nie, Yuchen Liu, and Renxiao Wang. Pdb-widecollection of binding data: current status of the pdbbind database.
Bioinformatics , 31(3):405–412, 2014.[60] Minyi Su, Qifan Yang, Yu Du, Guoqin Feng, Zhihai Liu, Yan Li, and Renxiao Wang. Comparative assessmentof scoring functions: The casf-2016 update.
Journal of chemical information and modeling , 59(2):895–913,2018.[61] Maciej Wójcikowski, Michał Kukiełka, Marta M Stepniewska-Dziubinska, and Pawel Siedlecki. Developmentof a protein–ligand extended connectivity (plec) fingerprint and its application for binding affinity predictions.
Bioinformatics , 2018.[62] Indra Kundu, Goutam Paul, and Raja Banerjee. A machine learning approach towards the prediction ofprotein–ligand binding affinity based on fundamental molecular properties.