Ana T. Winck
Pontifícia Universidade Católica do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ana T. Winck.
BMC Genomics | 2010
Karina S. Machado; Ana T. Winck; Duncan D. Ruiz; Osmar Norberto de Souza
BackgroundMolecular docking simulation is the Rational Drug Design (RDD) step that investigates the affinity between protein receptors and ligands. Typically, molecular docking algorithms consider receptors as rigid bodies. Receptors are, however, intrinsically flexible in the cellular environment. The use of a time series of receptor conformations is an approach to explore its flexibility in molecular docking computer simulations, but it is extensively time-consuming. Hence, selection of the most promising conformations can accelerate docking experiments and, consequently, the RDD efforts.ResultsWe previously docked four ligands (NADH, TCL, PIF and ETH) to 3,100 conformations of the InhA receptor from M. tuberculosis. Based on the receptor residues-ligand distances we preprocessed all docking results to generate appropriate input to mine data. Data preprocessing was done by calculating the shortest interatomic distances between the ligand and the receptor’s residues for each docking result. They were the predictive attributes. The target attribute was the estimated free-energy of binding (FEB) value calculated by the AutodDock3.0.5 software. The mining inputs were submitted to the M5P model tree algorithm. It resulted in short and understandable trees. On the basis of the correlation values, for NADH, TCL and PIF we obtained more than 95% correlation while for ETH, only about 60%. Post processing the generated model trees for each of its linear models (LMs), we calculated the average FEB for their associated instances. From these values we considered a LM as representative if its average FEB was smaller than or equal the average FEB of the test set. The instances in the selected LMs were considered the most promising snapshots. It totalized 1,521, 1,780, 2,085 and 902 snapshots, for NADH, TCL, PIF and ETH respectively.ConclusionsBy post processing the generated model trees we were able to propose a criterion of selection of linear models which, in turn, is capable of selecting a set of promising receptor conformations. As future work we intend to go further and use these results to elaborate a strategy to preprocess the receptors 3-D spatial conformation in order to predict FEB values. Besides, we intend to select other compounds, among the million catalogued, that may be promising as new drug candidates for our particular protein receptor target.
BMC Bioinformatics | 2012
Rodrigo C. Barros; Ana T. Winck; Karina S. Machado; Márcio P. Basgalupp; André Carlos Ponce Leon Ferreira de Carvalho; Duncan D. Ruiz; Osmar Norberto de Souza
BackgroundThis paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance.ResultsThe empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application.ConclusionsWe conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
brazilian symposium on bioinformatics | 2009
Ana T. Winck; Karina S. Machado; Osmar Norberto-de-Souza; Duncan Dubugrás Ruiz
Among different alternatives to consider the receptor flexibility in molecular docking experiments we opt to execute a series of docking using receptor snapshots generated by molecular dynamics simulations. Our target is the InhA enzyme from Mycobacterium tuberculosis bound to NADH, TCL, PIF and ETH ligands. After testing some mining strategies on these data, we conclude that, to obtain better outcomes, the development of an organized repository is especially useful. Thus, we built a comprehensive and robust database called FReDD to store the InhA-ligand docking results. Using this database we concentrate efforts on data mining to explore the docking results in order to accelerate the identification of promising ligands against the InhA target.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2011
Karina S. Machado; Ana T. Winck; Duncan D. Ruiz; Osmar Norberto de Souza
Knowledge discovery in databases has become an integral part of practically every aspect of bioinformatics research, which usually produces, and has to process, very large amounts of data. Rational drug design is one of the current scientific areas that has greatly benefited from bioinformatics, particularly a step, which analyzes receptor–ligand interactions via molecular docking simulations. An important challenge is the inclusion of the receptor flexibility since they can become computationally very demanding. We have represented this explicit flexibility as a series of different conformations derived from a molecular dynamics simulation trajectory of the receptor. This model has been termed as the fully flexible receptor (FFR) model. In our studies, the receptor is the enzyme InhA from Mycobacterium tuberculosis, which is the major drug target for the treatment of tuberculosis. The FFR model of InhA (named FFR_InhA) was docked to four ligands, namely, nicotinamide adenine dinucleotide, pentacyano(isoniazid)ferrate II, triclosan, and ethionamide, thus, generating very large amounts of data, which needs to be mined to produce useful knowledge to help accelerate drug discovery and development. Very little work has been done in this area. In this article, we review our work on the application of classification decision trees, regression model tree, and association rules using properly preprocessed data of the FFR molecular docking results, and show how they can provide an improved understanding of the FFR_InhA‐ligand behavior. Furthermore, we explain how data mining techniques can support the acceleration of molecular docking simulations of FFR models.
BMC Genomics | 2013
Ana T. Winck; Karina S. Machado; Osmar Norberto de Souza; Duncan D. Ruiz
BackgroundData preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands.ResultsWe generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures.ConclusionsStatistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models.
brazilian symposium on bioinformatics | 2010
Karina S. Machado; Ana T. Winck; Duncan Dubugrás Ruiz; O. Norberto de Souza
A careful analysis of flexible-receptor molecular docking results, particularly those related to details of receptor-ligand interactions, is essential to improve the process of docking and the understanding of intermolecular recognition. Because flexible-receptor docking simulations generate large amounts of data, their manual analysis is impractical. We intend to apply classification decision trees algorithms to better understand this type of docking results. However, prior to that we need to discretize the target attribute, which in this work is the estimated Free Energy of Binding (FEB) of the flexible receptor-ligand interactions. Here we compare three different discretization methods, by equal frequency (1), by equal width (2) and our proposed method, based on the mode and standard deviation (3) of the FEB values.
brazilian symposium on bioinformatics | 2012
Ana T. Winck; Christian Vahl Quevedo; Karina S. Machado; Osmar Norberto de Souza; Duncan D. Ruiz
A wide range of public ligand databases provides currently dozens of millions ligands to users. Consequently, exaustive in silico virtual screening testing with such a high volume of data is particularly expensive. Because of this, there is a demand for the development of new solutions that can reduce the number of testing ligands on their target receptors. Nevertheless, there is no method to reduce effectively that high number in a manageable amount, thus becoming this issue a major challenge of rational drug design. This article presents a comparative analysis among the main public ligand databases by measuring the quality and variations in the values of the molecular descriptors available in each one. It aims to help the development of new methods based on criteria that reduce the set of promising ligands to be tested.
international conference industrial engineering other applications applied intelligent systems | 2010
Ana T. Winck; Karina S. Machado; Duncan D. Ruiz
One of the challenges in natural language processing (NLP) is to semantically treat documents. Such process is tailored to specific domains, where bioinformatics appears as a promising interest area. We focus this work on the rational drug design process, in trying to help the identification of new target proteins (receptors) and drug candidate compounds (ligands) in scientific documents. Our approach is to handle such structures as named entities (NE) in the text.We propose the recognition of these NE by analyzing their context. In doing so, considering an annotated corpus on the RDD domain, we present models generated by association rules mining that indicate which terms relevant to the context point out the presence of a receptor or ligand in a sentence.
brazilian conference on intelligent systems | 2015
Henry E.L. Cagnin; Ana T. Winck; Rodrigo C. Barros
Support Vector Machines (SVMs) is one of the most efficient methods for data classification in machine learning. Several efforts were dedicated towards improving its performance through source-code parallelization, particularly within the Graphics Processor Unit (GPU). Those studies make use of the well-known CUDA framework, which is provided by NVIDIA for its graphics cards. Nevertheless, the main disadvantage of CUDA-based solutions is that they are specific to NVIDIA cards, reducing the applicability of such solutions in heterogeneous environments. In this work, we propose the parallelization of SVMs through the OpenCL framework, which allows the generated solution to be portable to a wide range of GPU manufacturers. The proposed approach parallelizes the most costly steps that are performed when training SVMs. We show that the proposed solution achieves a significant speedup regarding the algorithms original version, and also that it outperforms the state-of-the-art CUDA-based approach in terms of computational performance in 11 out of the 12 datasets that were tested in this work.
acm symposium on applied computing | 2013
Giovanni Xavier Perazzo; Ana T. Winck; Karina S. Machado
In a Rational Drug Design (RDD) one important step is the receptor-ligand interaction evaluation through molecular docking simulations. How it is a way impossible to test all available compounds for a target receptor, there is a need to select the most promising. One possible approach for such selection is to consider characteristics like a set of molecular properties called molecular descriptors. Aiming at describing these characteristics, we introduce a Data Warehouse (DW) model that integrates molecular descriptors from different public databases of compounds, as well as relates them with Virtual Screening (VS) experiments data. With the proposed DW we are able to produce proper data sets for classification mining experiments. We performed a case study with a VS considering as receptor the HIV-1 Protease receptor and 76 compounds. The data sets produced from our DW are composed by 7 molecular descriptors as the predictive attributes, and as a target attribute the discretized Free Energy of Binding (FEB) value between the ligands and the target receptor. By performing C4.5 algorithm over the generated data sets, we got decision-trees models that indicates which molecular descriptors and their respective values are relevant to influence on good FEB results.