Explaining Chemical Toxicity using Missing Features
Kar Wai Lim, Bhanushee Sharma, Payel Das, Vijil Chenthamarakshan, Jonathan S. Dordick
EExplaining Chemical Toxicity Using Missing Features
Kar Wai Lim Bhanushee Sharma Payel Das Vijil Chenthamarakshan Jonathan S. Dordick Abstract
Chemical toxicity prediction using machine learn-ing is important in drug development to reducerepeated animal and human testing, thus savingcost and time. It is highly recommended that thepredictions of computational toxicology modelsare mechanistically explainable. Current state ofthe art machine learning classifiers are based ondeep neural networks, which tend to be complexand harder to interpret. In this paper, we apply arecently developed method named contrastive ex-planations method (CEM) to explain why a chem-ical or molecule is predicted to be toxic or not.In contrast to popular methods that provide ex-planations based on what features are present inthe molecule, the CEM provides additional ex-planation on what features are missing from themolecule that is crucial for the prediction, knownas the pertinent negative. The CEM does this byoptimizing for the minimum perturbation to themodel using a projected fast iterative shrinkage-thresholding algorithm (FISTA). We verified thatthe explanation from CEM matches known toxi-cophores and findings from other work.
1. Introduction
The dramatic increase of potential drugs in the discoveryphase over the last decade has not been accompanied by asimilar increase in new drug approvals; rather, approvalshave decreased with a large portion of these drug candidatesbeing rejected (Hwang et al., 2016). Poor efficacy andtoxicity remain major limiting aspects of this drug discovery(Hay et al., 2014). Traditionally, in vivo methods havebeen used to test the toxicities of these chemicals and drugs.But these are usually expensive and time consuming and IBM Research Singapore Rensselaer Polytechnic In-stitute IBM Research AI. Correspondence to: Payel Das < [email protected] > , Kar Wai Lim < [email protected] > . Proceedings of the th International Conference on MachineLearning , Vienna, Austria, PMLR 108, 2020. Copyright 2020 bythe author(s). hence there has been a shift towards in vitro and to machinelearning (ML) based in silico methods.A variety of ML models have been applied for this task,predicting toxicities in cells ( in vitro ), in animals ( in vivo ),and in humans. These models take in a variety of inputs,from chemical descriptors to mechanistic biological descrip-tors, such as targets and biological pathways. Chemical de-scriptors are used more frequently, from descriptors derivedfrom chemical and physical properties (Luco & Ferretti,1997; Abdelaziz et al., 2016), to structural chemical finger-prints (Mayr et al., 2016), to even chemical 3D (Matsuzaka& Uesawa, 2019) and 2D images (Fernandez et al., 2018).Chemical structures, even though not always directly relatedto the toxic effect of a molecule (Alves et al., 2016), arecommonly used for screening in the drug development pro-cess (Limban et al., 2018). The type of model applied alsodiffers, from similarity based models (Ajmani et al., 2006;Chavan et al., 2015) to SVMs (Cao et al., 2012), to randomforests (Polishchuk et al., 2009), and most recently towardsdeep learning models (Mayr et al., 2016; Jimenez-Carreteroet al., 2018). This shift towards deep learning models hasbeen pushed by their superior predictive performance (Mayret al., 2016), and their ability to self select significant fea-tures (Tang et al., 2018). However, though the performanceof these predictive models is getting better, this shift towardsmore “black-box” models makes it increasingly unclear toexplain why a molecule is predicted as toxic or not toxic.Yet this information is extremely important, not only forthe drug development process in highlighting which chemi-cal substructures to avoid while designing new molecules,but also for providing more confidence to the end-usersof this information - e.g., the experimentalists, who designmolecules and in vitro toxicity measuring assays and alreadyhave expert knowledge on which chemical (sub)structuresto avoid during new molecule design. Therefore, it is highlyrecommended for predictions of computational toxicologymodels to be explainable (OECD, 2016).Many different methods have been applied to pinpoint thesetoxic substructures, or “structural alerts”/“toxicophores” inorder to help explain toxicity prediction results. However,these methods have only focused on pinpointing the pres-ence of certain features, such as toxicophores, on explainingthe resulting toxicity predictions, but they in general have a r X i v : . [ q - b i o . Q M ] S e p xplaining Chemical Toxicity Using Missing Features not examined the affect of the absence of these features togenerate explanations. However, when defining contrastivefeatures that are necessary for a prediction, defining both thepresent and absent features, that are minimal and necessary,can help provide an explanation that is more easily com-prehensible for users. Keeping this in mind, Dhurandharet al. (2018) have recently developed a contrastive expla-nations method (CEM), to identify both pertinent positiveand pertinent negative features in neural networks applied toimage data (MNIST handwritten digits dataset), to text data(procurement fraud dataset), and to brain activity strengthdata.In our current work, we apply this model to the domain ofpredicting toxicity of molecules, to provide better insightsinto explanations of toxicity predictions. For the task of pre-dicting toxicity from chemical structures of molecules, thesecontrastive features could be chemical substructures or func-tional groups — for positive pertinent features these wouldbe the minimum required substructures for a molecule’sclassification (structural alerts or toxicophores in the casefor a toxic class), while for negative pertinent features thesewould be the minimum changes to the molecule that wouldflip the classification of a molecule, from toxic to nontoxicor vice versa. Such an explanation will expand the scope ofcurrent toxicity prediction explanations, beyond just defin-ing toxicophores. These explanations can also provide in-formation on how to convert a toxic molecule to a nontoxicmolecule, the exact contributing chemical substructures, aswell as provide information on what can make a moleculesafe.
2. Contrastive Explanations Method (CEM)
One major strength of the CEM is to identify missing fea-tures that contribute to an explanation, in addition to fea-tures that are present in a molecule. To illustrate, a moleculecan be nontoxic because it does not have a particular toxi-cophore that is harmful. Such explanation is more intuitiveand complete than just saying that the molecule is nontoxicbecause of the presence of other structures that correlatewith nontoxicity.The CEM introduces the notion of Pertinent Negative (PN)and Pertinent Positve (PP). A PN is a subset of the featureset that is necessary for a classifier to predict a given class,while a PP is the minimal subset of features whose pres-ence gives rise to its prediction. The CEM obtains the PNand PP via an optimization problem to look for a requiredminimum perturbation to the model, using a projected fastiterative shrinkage-thresholding algorithm (FISTA). Referto Dhurandhar et al. (2018) for more details. Examples onPN and PP are given in Section 4.
3. Toxicity Prediction Model
We apply the CEM to a deep neural network (DNN) modelused for toxicity prediction. For simplicity, we focus on onespecific task within the Tox21 challenge, i.e., ‘SR-MMP’.This particular task is chosen because it is the simplesttoxicity prediction task with 95% AUC-ROC achieved bythe winner of the Tox21 challenge. The Tox21 challenge provides the results of twelve in vitro assays that test seven different nuclear receptor signaling ef-fects, and five stress response effects of ∼ in vitro , the effect of a large number of chemi-cals on their ability to disrupt biological pathways throughhigh-throughput screening (HTS) techniques (Krewski et al.,2010; Tice et al., 2013; Kavlock et al., 2009). The SR-MMPtask specifically tests the ability of molecules to disrupt themitochondrial membrane potential (MMP) in cells (Huanget al., 2016). To generate the features to the DNN model,we convert the SMILES representation of each moleculeinto a fixed length bit vector of size 4096 using Morganfingerprinting (Rogers & Hahn, 2010). These bits convey in-formation about the substructure of the molecules, for exam-ple, the 3236th bit corresponds to a substructure containingthe bromine atom (see Figure 3). Our Morgan fingerprintsencapsulate all molecular substructures up to a radius ofsize 2.Since our main focus is on the explanation method, weemploy a simple DNN model, one with only three hiddenlayers and ReLU activation function. The sizes of the hiddenlayers are 2048, 1024 and 512 respectively. The output layerhas 2 nodes corresponding to the toxic/nontoxic labels andis activated by a softmax. We train the DNN model for 200epochs with random batches of 256 molecules per trainingstep. Binary cross entropy was chosen as the loss functionand it was optimized using the ADADELTA optimizer. Thesimple DNN model achieves a test set ROC AUC of 0.8777,with true positive of 0.5754 and true negative of 0.9524,overall, the F1 score is 0.6280. Note that the results arecomparable with that of Liu et al. (2019, Table S14).
4. Results
Pertinent Negative Features.
In our toxicity predictionmodel, pertinent negatives (PN) are the minimum changesto the input feature of a molecule to flip its predicted label.
Example 1: Triphenylphosphine oxide : For example, triph-enylphosphine oxide was correctly predicted to be nontoxicfor the SR-MMP endpoint, with a prediction probability of https://tripod.nih.gov/tox21/challenge/leaderboard.jsp xplaining Chemical Toxicity Using Missing Features Figure 1.
Molecular representation of triphenylphosphine oxide(left) and triphenylborane (right).
In order to make triphenylphosphine oxide toxic, the bits tobe added are 690, 876, 164, 62, 276, etc. To give an illus-tration of what these bits entail, we refer to other moleculesin the database that have these bits and borrow the toxicparts corresponding to these bits. A selection of top 10 partsfor the pertinent negatives are displayed below in Figure 2.Interestingly, we can see that the PN includes the trichlo-ride (SbCl3), which is known to be very toxic (Cooper &Harrison, 2009).
Figure 2.
Pertinent Negatives (PN) for triphenylphosphine oxide,adding these parts into the molecule will make it toxic. For thesesubstructures, the blue circle highlights the central atom, the yellowcircles are part of an aromatic ring, and the gray circles are part ofan aliphatic ring.
Example 2: Triphenylborane : We now look at another ex-ample where the DNN model misclassified the moleculetriphenylborane to be nontoxic, where it is in fact toxic forthe endpoint SR-MMP. Here, we use the pertinent negativesto identify why the model makes such false prediction.For triphenylborane, the CEM algorithm provides only 5pertinent negative bits, meaning that it takes less changesto flip the prediction to toxic (classification probability fortoxic class changed from 0.0205 to 0.5062). Here, we canclearly see that adding of bromine atom and hydroxyl (OH)group would flip the classifier’s prediction. Here, one wayto explain why the classifier makes the false prediction is be-cause such features were not observed in triphenylborane. Infact, this molecule bears close resemblance to triphenylphos-phine oxide (example 1) where the only difference is in the connecting molecule in between the aromatic rings. Thus itis not bizarre to see such a false prediction.In contrast to existing explanation methods that provideexplanation based on what is present in a molecule, thepertinent negatives instead look at what is missing from themolecule, which is an useful and more complete explana-tion criteria. This is one of the main strength of using thepertinent negatives for explanation.
Figure 3.
Pertinent Negatives (PN) for triphenylborane, addingthese parts into the molecule will make it toxic.
Pertinent Positive.
Pertinent Positives (PP) answer thequestion: what is the minimum required features such that amolecule is classified as such?
Example 3: Ethalfluralin : Ethalfluralin is a known herbicidefor weeds. Here, the DNN model correctly predicted thatthe molecule is toxic on the endpoint SR-MMP.
Figure 4.
Molecular representation of ethalfluralin (left) and 2,6-dichlorophenol (right).
To understand why the model makes such prediction, we caninspect the pertinent positive for the molecule. In Figure 5,we display five pertinent positives for the prediction.
Figure 5.
Pertinent Positives (PP) for ethalfluralin, which are thecontributing parts for the toxic prediction of the molecule.
Here, we can see that one of the contributing factor to itstoxicity prediction is its trifluoromethyl functional group.However, we caution that this functional group may or maynot be the causal factor to its toxicity, as the CEM can onlyextract correlated factors rather than causal ones.
Example 4: 2,6-Dichlorophenol : Next, we look at an exam-ple when the DNN model predicts a molecule incorrectly. xplaining Chemical Toxicity Using Missing Features
The 2,6-dichlorophenol is corrosive and is an environmen-tal hazard according to PubChem but it is not toxic in theSR-MMP endpoint. However, the DNN model predicts it tobe toxic with a probability of 0.9999. We employ the CEMto look for its pertinent positive bits.The CEM identifies three main bits for the pertinent posi-tive, which are illustrated in the Figure 6. These three bitscorrespond to the hydroxyl (OH) functional group and thechlorine atom. The presence of these parts in the moleculeresulted in the DNN model to predict the molecule to betoxic, albeit incorrectly.
Figure 6.
Pertinent Positives (PP) for 2,6-dichlorophenol, whichare the contributing parts for the toxic prediction of the molecule.
Verifying Identified Explanations.
Akita et al. (2018,Fig 3) found that the phenolic OH group is responsiblefor the prediction of the toxicity of the molecule tyrphostin9 in their classification model. This is consistent with thefinding of CEM, which identified the OH functional groupas an explanation (in both PP and PN) for our predictions.In addition, we also found that the phenolic OH functionalgroup solely forms the pertinent positive of tyrphostin 23(not displayed), which is closely related to tyrphostin 9.
Checking with Known Toxicophores.
Our model’s ex-planations, through identifying PP and PN substructures,can pinpoint structural alerts — substructures in moleculethat are required for a toxic label or substructures that canflip the predicted label to a toxic label.The SR-MMP task measures the ability of molecules to dis-rupt the mitochondrial membrane potential in cells. Weaklyacid groups are known to uncouple this membrane potential,with the phenolic OH being the most common weakly acidgroup (Terada, 1990). Our model identifies phenolic OHgroups as PP or PNs to for multiple molecules, consistentwith this work (Table 1).Further, through past experiments and testing, various sub-structures have already been known to act as structural alerts/ toxicophores for different toxic endpoints. A commonlyused experiment is the Ames test that tests for the muta-genicity of chemicals in vitro in cells, general structuralalerts from these have already been curated in past studiesand is a common benchmark in identifying toxicophoresin molecules (Kazius et al., 2005; 2006; Yang et al., 2017).Our model was able to identify several of these general toxi-cophores that are connected to mutagenicty, e.g., aliphatichalides, polycylic aromatic systems, aromatic nitros (Kazius
Table 1.
Structural alerts identified by the CEM match knowngeneral toxicophores (see Kazius et al., 2005).
General Pertinent Negative Pertinent PositiveToxicophore SA Eg SA EgPhenolic OH -cc(O)c- (1) -cO (3) -cO (1, 2)Aliphatic halides Br (1, 2) -cc(c(F)(F)F)cc- (3)(Cl, Br, I, F) Br (2) -c(Cl) (4)Polycyclic -c(ccc(c)c)c- (1)aromatic systemAromatic nitro -ccc(N + (O − )=O)c(N + )c- (1) N, O − , N + =O (3) et al., 2006), as both PP and PN to either explain the toxicityof a molecule or to turn a molecule into a toxic one (Table 1).
5. Conclusion
Applying the recently proposed CEM (Dhurandhar et al.,2018) to the task of predicting toxicity of molecules, inparticular the SR-MMP task, we were able to demonstratethe ability to explain toxic predictions both from structuralalerts as well as examining the substructures that are missingthat can switch a molecule to be toxic. This is helpful interms of both identifying toxicophores that might need tobe avoided in designing a molecule, as well as giving anindication of how to convert a molecule to be toxic, andthus reversely nontoxic. The toxicophores extracted in sucha manner, were consistent with both known experimentaltoxicophores and past models of identifying substructures.To note, even though a molecule can be classified as toxic ornontoxic by the SR-MMP endpoint, this is only measuringits toxicity in regards to this one endpoint - whether themolecule can disrupt the mitochondrial membrane potential.Thus this cannot necessarily predict if the molecule will betoxic for other endpoints, but can still give an indicationof this. This can be the reason for seemingly contradictingknown toxicities and toxicity for the SR-MMP endpoint,e.g., 2,6-dichlorophenol is corrosive and an environmentalhazard, but not toxic in the SR-MMP endpoint. Thus, it isimportant to be cognisant of the definition of toxicity we areexamining for our explanations.Our PNs are general examples of the minimum requiredsubstructures that can flip a molecule’s label to toxic, thus missing substructures that cause the predicted label to benontoxic. This is a good starting point to associate a partic-ular substructure to toxicity, however for a more completeexplanation, the next step would be to pinpoint the locationto which these substructures can be added to the moleculeto create chemically meaningful molecules.Overall, the CEM is able to help discern both the missingand contributing structural alerts for toxicity that are con-sistent to known toxicophores. Further work will extend toidentification of causal factors to molecular toxicity. xplaining Chemical Toxicity Using Missing Features
References
Abdelaziz, A., Spahn-Langguth, H., Schramm, K.-W., andTetko, I. V. Consensus modeling for HTS assays using insilico descriptors calculates the best balanced accuracy inTox21 challenge.
Frontiers in Environmental Science , 4:2, 2016.Ajmani, S., Jadhav, K., and Kulkarni, S. A. Three-dimensional QSAR using the k-nearest neighbor methodand its interpretation.
Journal of Chemical Informationand Modeling , 46(1):24–31, 2006.Akita, H., Nakago, K., Komatsu, T., Sugawara, Y., Maeda,S.-i., Baba, Y., and Kashima, H. BayesGrad: Explainingpredictions of graph convolutional networks. In
Interna-tional Conference on Neural Information Processing , pp.81–92. Springer, 2018.Alves, V. M., Muratov, E. N., Capuzzi, S. J., Politi, R., Low,Y., Braga, R. C., Zakharov, A. V., Sedykh, A., Mokshyna,E., Farag, S., et al. Alarms about structural alerts.
GreenChemistry , 18(16):4348–4360, 2016.Cao, D.-S., Zhao, J.-C., Yang, Y.-N., Zhao, C.-X., Yan,J., Liu, S., Hu, Q.-N., Xu, Q.-S., and Liang, Y.-Z.In silico toxicity prediction by support vector machineand SMILES representation-based string kernel.
SARand QSAR in Environmental Research , 23(1-2):141–153,2012.Chavan, S., Friedman, R., and Nicholls, I. A. Acute toxicity-supported chronic toxicity prediction: a k-nearest neigh-bor coupled read-across strategy.
International Journalof Molecular Sciences , 16(5):11659–11677, 2015.Cooper, R. G. and Harrison, A. P. The exposure to and healtheffects of antimony.
Indian Journal of Occupational andEnvironmental Medicine , 13(1):3, 2009.Dhurandhar, A., Chen, P.-Y., Luss, R., Tu, C.-C., Ting, P.,Shanmugam, K., and Das, P. Explanations based on themissing: Towards contrastive explanations with pertinentnegatives. In
Advances in Neural Information ProcessingSystems , pp. 592–603, 2018.Fernandez, M., Ban, F., Woo, G., Hsing, M., Yamazaki, T.,LeBlanc, E., Rennie, P. S., Welch, W. J., and Cherkasov,A. Toxic colors: the use of deep learning for predictingtoxicity of compounds merely from their graphic images.
Journal of Chemical Information and Modeling , 58(8):1533–1543, 2018.Hay, M., Thomas, D. W., Craighead, J. L., Economides,C., and Rosenthal, J. Clinical development success ratesfor investigational drugs.
Nature Biotechnology , 32(1):40–51, 2014. Huang, R., Xia, M., Nguyen, D.-T., Zhao, T., Sakamuru, S.,Zhao, J., Shahane, S. A., Rossoshek, A., and Simeonov,A. Tox21 challenge to build predictive models of nuclearreceptor and stress response pathways as mediated byexposure to environmental chemicals and drugs.
Frontiersin Environmental Science , 3:85, 2016.Hwang, T. J., Carpenter, D., Lauffenburger, J. C., Wang,B., Franklin, J. M., and Kesselheim, A. S. Failure ofinvestigational drugs in late-stage clinical developmentand publication of trial results.
JAMA Internal Medicine ,176(12):1826–1833, 2016.Jimenez-Carretero, D., Abrishami, V., Fern´andez-deManuel, L., Palacios, I., Qu´ılez- ´Alvarez, A., D´ıez-S´anchez, A., del Pozo, M. A., and Montoya, M. C.Tox (R)CNN: Deep learning-based nuclei profiling toolfor drug toxicity screening.
PLoS Computational Biology ,14(11), 2018.Kavlock, R. J., Austin, C. P., and Tice, R. R. Toxicity testingin the 21st century: Implications for human health riskassessment.
Risk Analysis , 29(4):485–487, 2009.Kazius, J., McGuire, R., and Bursi, R. Derivation andvalidation of toxicophores for mutagenicity prediction.
Journal of Medicinal Chemistry , 48(1):312–320, 2005.Kazius, J., Nijssen, S., Kok, J., B¨ack, T., and IJzerman, A. P.Substructure mining using elaborate chemical representa-tion.
Journal of Chemical Information and Modeling , 46(2):597–605, 2006.Krewski, D., Acosta Jr, D., Andersen, M., Anderson, H.,Bailar III, J. C., Boekelheide, K., Brent, R., Charnley,G., Cheung, V. G., Green Jr, S., et al. Toxicity testingin the 21st century: A vision and a strategy.
Journal ofToxicology and Environmental Health, Part B , 13(2-4):51–138, 2010.Limban, C., Nut¸˘a, D. C., Chirit¸˘a, C., Negre, S., Arsene,A. L., Goumenou, M., Karakitsios, S. P., Tsatsakis, A. M.,and Sarigiannis, D. A. The use of structural alerts to avoidthe toxicity of pharmaceuticals.
Toxicology Reports , 5:943–953, 2018.Liu, S., Demirel, M. F., and Liang, Y. N-gram graph: simpleunsupervised representation for graphs, with applicationsto molecules. In
Advances in Neural Information Pro-cessing Systems , pp. 8464–8476, 2019.Luco, J. M. and Ferretti, F. H. QSAR based on multiple lin-ear regression and PLS methods for the anti-HIV activityof a large group of HEPT derivatives.
Journal of Chemi-cal Information and Computer Sciences , 37(2):392–401,1997. xplaining Chemical Toxicity Using Missing Features
Matsuzaka, Y. and Uesawa, Y. Prediction model with high-performance constitutive androstane receptor (CAR) us-ing DeepSnap-Deep Learning approach from the Tox2110K compound library.
International Journal of Molecu-lar Sciences , 20(19):4855, 2019.Mayr, A., Klambauer, G., Unterthiner, T., and Hochreiter,S. DeepTox: Toxicity prediction using deep learning.
Frontiers in Environmental Science , 3:80, 2016.OECD. Guidance document for the use of AOPs in devel-oping IATA, 2016. Accessed online.Polishchuk, P. G., Muratov, E. N., Artemenko, A. G.,Kolumbin, O. G., Muratov, N. N., and Kuzmin, V. E.Application of random forest approach to QSAR predic-tion of aquatic toxicity.
Journal of Chemical Informationand Modeling , 49(11):2481–2488, 2009.Rogers, D. and Hahn, M. Extended-connectivity finger-prints.
Journal of Chemical Information and Modeling ,50(5):742–754, 2010.Tang, W., Chen, J., Wang, Z., Xie, H., and Hong, H. Deeplearning for predicting toxicity of chemicals: A minireview.
Journal of Environmental Science and Health,Part C , 36(4):252–271, 2018.Terada, H. Uncouplers of oxidative phosphorylation.
Envi-ronmental Health Perspectives , 87:213–218, 1990.Tice, R. R., Austin, C. P., Kavlock, R. J., and Bucher, J. R.Improving the human hazard characterization of chemi-cals: a Tox21 update.
Environmental Health Perspectives ,121(7):756–765, 2013.Yang, H., Li, J., Wu, Z., Li, W., Liu, G., and Tang, Y. Evalu-ation of different methods for identification of structuralalerts using chemical Ames mutagenicity data set as abenchmark.