Featured Researches

Quantitative Methods

Customised fragment libraries for ab initio protein structure prediction using a structural alphabet

Motivation: Computational protein structure prediction has taken over the structural community in past few decades, mostly focusing on the development of Template-Free modelling (TFM) or ab initio modelling protocols. Fragment-based assembly (FBA), falls under this category and is by far the most popular approach to solve the spatial arrangements of proteins. FBA approaches usually rely on sequence based profile comparison to generate fragments from a representative structural database. Here we report the use of Protein Blocks (PBs), a structural alphabet (SA) to perform such sequence comparison and to build customised fragment libraries for TFM. Results: We demonstrate that predicted PB sequences for a query protein can be used to search for high quality fragments that overall cover above 90% of the query. The fragments generated are of minimum length of 11 residues, and fragments that cover more than 30% of the query length were often obtained. Our work shows that PBs can serve as a good way to extract structurally similar fragments from a database of representatives of non-homologous structures and of the proteins that contain less ordered regions.

Read more
Quantitative Methods

Data Mining Techniques in Predicting Breast Cancer

Background and Objective: Breast cancer, which accounts for 23% of all cancers, is threatening the communities of developing countries because of poor awareness and treatment. Early diagnosis helps a lot in the treatment of the disease. The present study conducted in order to improve the prediction process and extract the main causes impacted the breast cancer. Materials and Methods: Data were collected based on eight attributes for 130 Libyan women in the clinical stages infected with this disease. Data mining was used by applying six algorithms to predict disease based on clinical stages. All the algorithms gain high accuracy, but the decision tree provides the highest accuracy-diagram of decision tree utilized to build rules from each leafnode. Ranking variables applied to extract significant variables and support final rules to predict disease. Results: All applied algorithms were gained a high prediction with different accuracies. Rules 1, 3, 4, 5 and 9 provided a pure subset to be confirmed as significant rules. Only five input variables contributed to building rules, but not all variables have a significant impact. Conclusion: Tumor size plays a vital role in constructing all rules with a significant impact. Variables of inheritance, breast side and menopausal status have an insignificant impact in analysis, but they may consider remarkable findings using a different strategy of data analysis.

Read more
Quantitative Methods

Data Mining and Analytical Models to Predict and Identify Adverse Drug-drug Interactions

The use of multiple drugs accounts for almost 30% of all hospital admission and is the 5th leading cause of death in America. Since over 30% of all adverse drug events (ADEs) are thought to be caused by drug-drug interactions (DDI), better identification and prediction of administration of known DDIs in primary and secondary care could reduce the number of patients seeking urgent care in hospitals, resulting in substantial savings for health systems worldwide along with better public health. However, current DDI prediction models are prone to confounding biases along with either inaccurate or a lack of access to longitudinal data from Electronic Health Records (EHR) and other drug information such as FDA Adverse Event Reporting System (FAERS) which continue to be the main barriers in measuring the prevalence of DDI and characterizing the phenomenon in medical care. In this review, analytical models including Label Propagation using drug side effect data and Supervised Learning DDI Prediction model using Drug-Gene interactions (DGIs) data are discussed. Improved identification of DDIs in both of these models compared to previous versions are highlighted while limitations that include bias, inaccuracy, and insufficient data are also assessed. A case study of Psoriasis DDI prediction by DGI data using Random Forest Classifier was studied. Transfer Matrix Recurrent Neural Networks (TM-RNN) that address the above limitations are discussed in future works.

Read more
Quantitative Methods

Deep ICE: A Deep learning approach for MRI Intracranial Cavity Extraction

Automatic methods for measuring normalized regional brain volumes from MRI data are a key tool to help in the objective diagnostic and follow-up of many neurological diseases. To estimate such regional brain volumes, the intracranial cavity volume is commonly used for normalization. In this paper, we present an accurate and efficient approach to automatically segment the intracranial cavity using a volumetric 3D convolutional neural network and a new 3D patch extraction strategy specially adapted to deal with the traditional low number of training cases available in supervised segmentation and the memory limitations of modern GPUs. The proposed method is compared with recent state-of-the-art methods and the results show an excellent accuracy and improved performance in terms of computational burden.

Read more
Quantitative Methods

Deep Learning in Mining Biological Data

Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Broadly categorized in three types (i.e., sequences, images, and signals), these data are huge in amount and complex in nature. Mining such an enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities and lately their deep architectures - known as deep learning (DL) - have been successfully applied to solve many complex pattern recognition problems. Highlighting the role of DL in recognizing patterns in biological data, this article provides - applications of DL to biological sequences, images, and signals data; overview of open access sources of these data; description of open source DL tools applicable on these data; and comparison of these tools from qualitative and quantitative perspectives. At the end, it outlines some open research challenges in mining biological data and puts forward a number of possible future perspectives.

Read more
Quantitative Methods

Deep Neural Network Based Differential Equation Solver for HIV Enzyme Kinetics

Purpose: We seek to use neural networks (NNs) to solve a well-known system of differential equations describing the balance between T cells and HIV viral burden. Materials and Methods: In this paper, we employ a 3-input parallel NN to approximate solutions for the system of first-order ordinary differential equations describing the above biochemical relationship. Results: The numerical results obtained by the NN are very similar to a host of numerical approximations from the literature. Conclusion: We have demonstrated use of NN integration of a well-known and medically important system of first order coupled ordinary differential equations. Our trial-and-error approach counteracts the system's inherent scale imbalance. However, it highlights the need to address scale imbalance more substantively in future work. Doing so will allow more automated solutions to larger systems of equations, which could describe increasingly complex and biologically interesting systems.

Read more
Quantitative Methods

Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale

Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising of a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks, and autoencoders (i.e., ConvAE) to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. ConvAE significantly outperformed several baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. With these results, we demonstrate that ConvAE can generate patient representations that lead to clinically meaningful insights. This scalable framework can help better understand varying etiologies in heterogeneous sub-populations and unlock patterns for EHR-based research in the realm of personalized medicine.

Read more
Quantitative Methods

Deep learning can differentiate IDH-mutant from IDH-wild type GBM

Background: Distinction of IDH mutant and wildtype GBMs is challenging on MRI, since conventional imaging shows considerable overlap. While few studies employed deep-learning in a mixed low/high grade glioma population, a GBM-specific model is still lacking in the literature. Our objective was to develop a deep-learning model for IDH prediction in GBM by using Convoluted Neural Networks (CNN) on multiparametric MRI. Methods: We included 100 adult patients with pathologically proven GBM and IDH testing. MRI data included: morphologic sequences, rCBV and ADC maps. Tumor area was obtained by a bounding box function on the axial slice with widest tumor extension on T2 images and was projected on every sequence. Data was split into training and test (80:20) sets. A 4 block 2D - CNN architecture was implemented for IDH prediction on every MRI sequence. IDH mutation probability was calculated with softmax activation function from the last dense layer. Highest performance was calculated accounting for model accuracy and categorical cross-entropy loss (CCEL) in the test cohort. Results: Our model achieved the following performance: T1 (accuracy 77%, CCEL 1.4), T2 (accuracy 67%, CCEL 2.41), FLAIR (accuracy 77%, CCEL 1.98), MPRAGE (accuracy 66%, CCEL 2.55), rCBV (accuracy 83%, CCEL 0.64). ADC achieved lower performance. Conclusion: We built a GBM-tailored deep-learning model for IDH mutation prediction, achieving accuracy of 83% with rCBV maps. High predictivity of perfusion images may reflect the known correlation between IDH, hypoxia inducible factor (HIF) and neoangiogenesis. This model may set a path for non-invasive evaluation of IDH mutation in GBM.

Read more
Quantitative Methods

Deep learning for peptide identification from metaproteomics datasets

Metaproteomics are becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra against a predefined protein sequence database and assigning top-scored peptides to spectra. Existing computational tools are still far from being able to extract all the information out of large MS/MS datasets acquired from metaproteome samples. In this paper, we proposed a deep-learning-based algorithm, called DeepFilter, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Compared with other post-processing tools, including Percolator, Q-ranker, PeptideProphet, and Iprophet, DeepFilter identified 20% and 10% more peptide-spectrum-matches and proteins, respectively, on marine microbial and soil microbial metaproteome samples with false discovery rate at 1%.

Read more
Quantitative Methods

Deep learning-based i-EEG classification with convolutional neural networks for drug-target interaction prediction

Drug-target interaction (DTI) prediction has become a foundational task in drug repositioning, polypharmacology, drug discovery, as well as drug resistance and side-effect prediction. DTI identification using machine learning is gaining popularity in these research areas. Through the years, numerous deep learning methods have been proposed for DTI prediction. Nevertheless, prediction accuracy and efficiency remain key challenges. Pharmaco-electroencephalogram (pharmaco-EEG) is considered valuable in the development of central nervous system-active drugs. Quantitative EEG analysis demonstrates high reliability in studying the effects of drugs on the brain. Earlier preclinical pharmaco-EEG studies showed that different types of drugs can be classified according to their mechanism of action on neural activity. Here, we propose a convolutional neural network for EEG-mediated DTI prediction. This new approach can explain the mechanisms underlying complicated drug actions, as it allows the identification of similarities in the mechanisms of action and effects of psychotropic drugs.

Read more

Ready to get started?

Join us today