Jie Hou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jie Hou is active.

Explore More

Publication

Featured researches published by Jie Hou.

Scientific Reports | 2015

Improving Protein Fold Recognition by Deep Learning Networks.

Taeho Jo; Jie Hou; Jesse Eickholt; Jianlin Cheng

For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl’s benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.

BMC Bioinformatics | 2016

DeepQA: improving the estimation of single protein model quality with deep belief networks

Renzhi Cao; Debswapna Bhattacharya; Jie Hou; Jianlin Cheng

BackgroundProtein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem.ResultsWe introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods.ConclusionDeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/.

Bioinformatics | 2016

QAcon: single model quality assessment using protein structural and contact information with machine learning techniques

Renzhi Cao; Badri Adhikari; Debswapna Bhattacharya; Miao Sun; Jie Hou; Jianlin Cheng

Motivation: Protein model quality assessment (QA) plays a very important role in protein structure prediction. It can be divided into two groups of methods: single model and consensus QA method. The consensus QA methods may fail when there is a large portion of low quality models in the model pool. Results: In this paper, we develop a novel single‐model quality assessment method QAcon utilizing structural features, physicochemical properties, and residue contact predictions. We apply residue‐residue contact information predicted by two protein contact prediction methods PSICOV and DNcon to generate a new score as feature for quality assessment. This novel feature and other 11 features are used as input to train a two‐layer neural network on CASP9 datasets to predict the quality of a single protein model. We blindly benchmarked our method QAcon on CASP11 dataset as the MULTICOM‐CLUSTER server. Based on the evaluation, our method is ranked as one of the top single model QA methods. The good performance of the features based on contact prediction illustrates the value of using contact information in protein quality assessment. Availability and Implementation: The web server and the source code of QAcon are freely available at: http://cactus.rnet.missouri.edu/QAcon Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

BMC Bioinformatics | 2016

ConEVA: a toolbox for comprehensive assessment of protein contacts

Badri Adhikari; Jackson Nowotny; Debswapna Bhattacharya; Jie Hou; Jianlin Cheng

BackgroundIn recent years, successful contact prediction methods and contact-guided ab initio protein structure prediction methods have highlighted the importance of incorporating contact information into protein structure prediction methods. It is also observed that for almost all globular proteins, the quality of contact prediction dictates the accuracy of structure prediction. Hence, like many existing evaluation measures for evaluating 3D protein models, various measures are currently used to evaluate predicted contacts, with the most popular ones being precision, coverage and distance distribution score (Xd).ResultsWe have built a web application and a downloadable tool, ConEVA, for comprehensive assessment and detailed comparison of predicted contacts. Besides implementing existing measures for contact evaluation we have implemented new and useful methods of contact visualization using chord diagrams and comparison using Jaccard similarity computations. For a set (or sets) of predicted contacts, the web application runs even when a native structure is not available, visualizing the contact coverage and similarity between predicted contacts. We applied the tool on various contact prediction data sets and present our findings and insights we obtained from the evaluation of effective contact assessments. ConEVA is publicly available at http://cactus.rnet.missouri.edu/coneva/.ConclusionConEVA is useful for a range of contact related analysis and evaluations including predicted contact comparison, investigation of individual protein folding using predicted contacts, and analysis of contacts in a structure of interest.

Bioinformatics | 2018

DeepSF: deep convolutional neural network for mapping protein sequences to folds

Jie Hou; Badri Adhikari; Jianlin Cheng

Motivation Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a target protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. Results We develop a deep 1D‐convolution neural network (DeepSF) to directly classify any protein sequence into one of 1195 known folds, which is useful for both fold recognition and the study of sequence‐structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold‐related features from a protein sequence of any length and maps it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding an average classification accuracy of 75.3%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 73.0%. We compare our method with a top profile‐profile alignment method—HHSearch on hard template‐based and template‐free modeling targets of CASP9‐12 in terms of fold recognition accuracy. The accuracy of our method is 12.63‐26.32% higher than HHSearch on template‐free modeling targets and 3.39‐17.09% higher on hard template‐based modeling targets for top 1, 5 and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking. Availability and implementation The DeepSF server is publicly available at: http://iris.rnet.missouri.edu/DeepSF/. Supplementary information Supplementary data are available at Bioinformatics online.

BMC Bioinformatics | 2017

Deep learning methods for protein torsion angle prediction

Haiou Li; Jie Hou; Badri Adhikari; Qiang Lyu; Jianlin Cheng

BackgroundDeep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins.ResultsWe design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20–21° and 29–30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method.ConclusionsOur experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.

Scientific Reports | 2016

Effects of aged garlic extract and FruArg on gene expression and signaling pathways in lipopolysaccharide-activated microglial cells

Hailong Song; Yuan Lu; Zhe Qu; Valeri V. Mossine; Matthew B. Martin; Jie Hou; Jiankun Cui; Brenda A. Peculis; Thomas P. Mawhinney; Jianlin Cheng; C. Michael Greenlief; Kevin L. Fritsche; Francis J. Schmidt; Ronald B. Walter; Dennis B. Lubahn; Grace Y. Sun; Zezong Gu

Aged garlic extract (AGE) is widely used as a dietary supplement on account of its protective effects against oxidative stress and inflammation. But less is known about specific molecular targets of AGE and its bioactive components, including N-α-(1-deoxy-D-fructos-1-yl)-L-arginine (FruArg). Our recent study showed that both AGE and FruArg significantly attenuate lipopolysaccharide (LPS)-induced neuroinflammatory responses in BV-2 microglial cells. This study aims to unveil effects of AGE and FruArg on gene expression regulation in LPS stimulated BV-2 cells. Results showed that LPS treatment significantly altered mRNA levels from 2563 genes. AGE reversed 67% of the transcriptome alteration induced by LPS, whereas FruArg accounted for the protective effect by reversing expression levels of 55% of genes altered by LPS. Key pro-inflammatory canonical pathways induced by the LPS stimulation included toll-like receptor signaling, IL-6 signaling, and Nrf2-mediated oxidative stress pathway, along with elevated expression levels of genes, such as Il6, Cd14, Casp3, Nfkb1, Hmox1, and Tnf. These effects could be modulated by treatment with both AGE and FruArg. These findings suggests that AGE and FruArg are capable of alleviating oxidative stress and neuroinflammatory responses stimulated by LPS in BV-2 cells.

Bioinformatics | 2018

DNCON2: improved protein contact prediction using two-level deep convolutional neural networks

Badri Adhikari; Jie Hou; Jianlin Cheng

Abstract Motivation Significant improvements in the prediction of protein residue–residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction. Results In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks—the first five predict contacts at 6, 7.5, 8, 8.5 and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11 and 12 experiments, DNCON2 achieves mean precisions of 35, 50 and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length. Availability and implementation The web server of DNCON2 is at http://sysbio.rnet.missouri.edu/dncon2/ where training and testing datasets as well as the predictions for CASP10, 11 and 12 free-modeling datasets can also be downloaded. Its source code is available at https://github.com/multicom-toolbox/DNCON2/. Supplementary information Supplementary data are available at Bioinformatics online.

PLOS ONE | 2015

From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data.

Jilong Li; Jie Hou; Lin Sun; Jordan Wilkins; Yuan Lu; Chad E. Niederhuth; Benjamin Ryan Merideth; Thomas P. Mawhinney; Valeri V. Mossine; C. Michael Greenlief; John C. Walker; William R. Folk; Mark Hannink; Dennis B. Lubahn; James A. Birchler; Jianlin Cheng

RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner—a multi-level bioinformatics protocol and pipeline—has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from Human, Mouse, Arabidopsis thaliana, and Drosophila melanogaster cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at http://calla.rnet.missouri.edu/rnaminer/index.html.

Proteins | 2018

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning

Badri Adhikari; Jie Hou; Jianlin Cheng

In this study, we report the evaluation of the residue‐residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM‐NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM‐CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution‐based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM‐CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free‐modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long‐range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution‐based features, and machine learning integration of coevolution‐based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution‐based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long‐range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66.

Explore More