Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shiheng Tao is active.

Publication


Featured researches published by Shiheng Tao.


Nucleic Acids Research | 2014

Deciphering the rules by which dynamics of mRNA secondary structure affect translation efficiency in Saccharomyces cerevisiae

Yuanhui Mao; Huiling Liu; Yanlin Liu; Shiheng Tao

Messenger RNA (mRNA) secondary structure decreases the elongation rate, as ribosomes must unwind every structure they encounter during translation. Therefore, the strength of mRNA secondary structure is assumed to be reduced in highly translated mRNAs. However, previous studies in vitro reported a positive correlation between mRNA folding strength and protein abundance. The counterintuitive finding suggests that mRNA secondary structure affects translation efficiency in an undetermined manner. Here, we analyzed the folding behavior of mRNA during translation and its effect on translation efficiency. We simulated translation process based on a novel computational model, taking into account the interactions among ribosomes, codon usage and mRNA secondary structures. We showed that mRNA secondary structure shortens ribosomal distance through the dynamics of folding strength. Notably, when adjacent ribosomes are close, mRNA secondary structures between them disappear, and codon usage determines the elongation rate. More importantly, our results showed that the combined effect of mRNA secondary structure and codon usage in highly translated mRNAs causes a short ribosomal distance in structural regions, which in turn eliminates the structures during translation, leading to a high elongation rate. Together, these findings reveal how the dynamics of mRNA secondary structure coupling with codon usage affect translation efficiency.


BMC Genomics | 2013

A new computational strategy for predicting essential genes

Jian Cheng; Wenwu Wu; Yinwen Zhang; Xiangchen Li; Xiaoqian Jiang; Gehong Wei; Shiheng Tao

BackgroundDetermination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms.ResultsWe first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction.ConclusionsFWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.


Biodata Mining | 2012

A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data

Li Li; Yang Guo; Wenwu Wu; Youyi Shi; Jian Cheng; Shiheng Tao

BackgroundSeveral biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required.MethodsIn this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana)GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms’ performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way.ResultsBoth WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods.A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset.


PLOS ONE | 2014

Training set selection for the prediction of essential genes.

Jian Cheng; Zhao Xu; Wenwu Wu; Li Zhao; Xiangchen Li; Yanlin Liu; Shiheng Tao

Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.


PLOS ONE | 2012

Coevolution in RNA Molecules Driven by Selective Constraints: Evidence from 5S rRNA

Nan Cheng; Yuanhui Mao; Youyi Shi; Shiheng Tao

Understanding intra-molecular coevolution helps to elucidate various structural and functional constraints acting on molecules and might have practical applications in predicting molecular structure and interactions. In this study, we used 5S rRNA as a template to investigate how selective constraints have shaped the RNA evolution. We have observed the nonrandom occurrence of paired differences along the phylogenetic trees, the high rate of compensatory evolution, and the high TIR scores (the ratio of the numbers of terminal to intermediate states), all of which indicate that significant positive selection has driven the evolution of 5S rRNA. We found three mechanisms of compensatory evolution: Watson-Crick interaction (the primary one), complex interactions between multiple sites within a stem, and interplay of stems and loops. Coevolutionary interactions between sites were observed to be highly dependent on the structural and functional environment in which they occurred. Coevolution occurred mostly in those sites closest to loops or bulges within structurally or functionally important helices, which may be under weaker selective constraints than other stem positions. Breaking these pairs would directly increase the size of the adjoining loop or bulge, causing a partial or total structural rearrangement. In conclusion, our results indicate that sequence coevolution is a direct result of maintaining optimal structural and functional integrity.


Gene | 2013

Universally increased mRNA stability downstream of the translation initiation site in eukaryotes and prokaryotes.

Yuanhui Mao; Wangtian Wang; Nan Cheng; Qian Li; Shiheng Tao

Local secondary structures in coding sequences have important functions across various translational processes. To date, however, the local structures and their functions in the early stage of translation elongation remain poorly understood. Here, we surveyed the structural stability in the first 180 nucleotides of the coding sequence of 27 species using computational method. We found that the structural stability in the 30-80 nucleotide interval was significantly higher than that in other regions in eukaryotes and most prokaryotes. No significant correlation between local translation efficiency and structural stability was observed, suggesting that this structural region has undergone selection pressure directly to maintain high stability. Furthermore, ribosome was blocked by this region, providing an opportunity for co-translational regulation. Remarkably, in eukaryotes, we found that mRNAs with higher structural stability in the 30-80 nucleotide interval tended to encode the secreted proteins. Overall, our results revealed a previously unappreciated correlation between structural stability and protein localization.


PLOS ONE | 2011

The influence of deleterious mutations on adaptation in asexual populations.

Xiaoqian Jiang; Zhao Xu; Jingjing Li; Youyi Shi; Wenwu Wu; Shiheng Tao

We study the dynamics of adaptation in asexual populations that undergo both beneficial and deleterious mutations. In particular, how the deleterious mutations affect the fixation of beneficial mutations was investigated. Using extensive Monte Carlo simulations, we find that in the “strong-selection weak mutation (SSWM)” regime or in the “clonal interference (CI)” regime, deleterious mutations rarely influence the distribution of “selection coefficients of the fixed mutations (SCFM)”; while in the “multiple mutations” regime, the accumulation of deleterious mutations would lead to a decrease in fitness significantly. We conclude that the effects of deleterious mutations on adaptation depend largely on the supply of beneficial mutations. And interestingly, the lowest adaptation rate occurs for a moderate value of selection coefficient of deleterious mutations.


PLOS ONE | 2013

A meta-analysis of the association between the hOGG1 Ser326Cys polymorphism and the risk of esophageal squamous cell carcinoma.

Junjie Zhang; Jingshi Zhou; Ping Zhang; Weiping Wang; Shiheng Tao; Minghua Wang

Background Genetic polymorphism of human 8-oxoguanine glycosylase 1 (hOGG1) Ser326Cys (rs1052133) has been implicated in the risk of Esophageal Squamous Cell Carcinoma (ESCC). However, the published findings are inconsistent. We therefore performed a meta-analysis to derive a more precise estimation of the association between the hOGG1 Ser326Cys polymorphism and ESCC risk. Methodology/Principal Findings A comprehensive search was conducted to identify eligible studies of hOGG1 Ser326Cys polymorphism and the risk of the ESCC. Three English and two Chinese databases were used, and ten published case-control studies, including 1987 cases and 2926 controls were identified. Odds ratios (ORs) and 95% confidence intervals (CIs) were used to assess the strength of the association in the dominant and recessive model. Pearson correlation coefficient (PCC) and standard error (SE) were used to assess the number of Cys allele and ESCC risk in the additive model. Overall, significant associations between the hOGG1 Ser326Cys polymorphism and ESCC risk were found in the recessive model: OR = 1.37 (95% CI: 1.06–1.76, p = 0.02). We also observed significant associations in the Caucasian, Chinese language, population based control and tissue subgroups. In the additive model, positive correlation was found between the number of Cys allele and the risk of ESCC in overall studies (PCC = 0.109, SE = 0.046, p = 0.02), Caucasian subgroup and population subgroup. Funnel plot and Eggers test indicate there was no publication bias in this meta-analysis. Conclusion Under the published data, the hOGG1 Ser326Cys polymorphism is associated with ESCC risk in the recessive and additive model. Compared with the Ser/Ser and Ser/Cys genotype, Cys/Cys genotype might contribute to increased risk of ESCC. And the risk of ESCC is positively correlated with the number of Cys allele. A better case-control matched study should be designed in order to provide a more precise estimation.


Genes & Genomics | 2017

Codon usage bias and evolutionary analyses of Zika virus genomes

Siddiq Ur Rahman; Yuanhui Mao; Shiheng Tao

Zika virus (ZIKV) is a member of the family Flaviviridae and contains a single-stranded RNA genome with positive-polarity. Like Dengue, Zika virus uses Aedes aegypti mosquito as a vector to infect human with a wide range of clinical signs, from asymptomatic to influenza-like syndrome. Despite significant progress in genomic analyses, how a viral relationship with two different hosts affect the overall fitness, constancy, and dodging of hosts immune system are elusive. Here we analyzed Zika virus codon-based evolution using eleven strains from different geographical locations. The overall codon usage was similar and slightly bias among all strains. An occurrence of A-ending in highly-preferred codons and analysis by various approaches strongly suggests that mutational bias is the main force shaping codon usage in this virus. However, natural selection and geographical realities cannot be ignored in marginal influence on codon usage. The viral genomes naturally favor Aedes aegypti over human host for tRNA pool in translation. Such findings will assist researchers in understanding elements contribute to viral adaptation and evolutionary setup with hosts.


PLOS ONE | 2015

Codon Usage in Signal Sequences Affects Protein Expression and Secretion Using Baculovirus/Insect Cell Expression System.

Yalan Wang; Yuanhui Mao; Xiaodong Xu; Shiheng Tao; Hongying Chen

By introducing synonymous mutations into the coding sequences of GP64sp and FibHsp signal peptides, the influences of mRNA secondary structure and codon usage of signal sequences on protein expression and secretion were investigated using baculovirus/insect cell expression system. The results showed that mRNA structural stability of the signal sequences was not correlated with the protein production and secretion levels, and FibHsp was more tolerable to codon changes than GP64sp. Codon bias analyses revealed that codons for GP64sp were well de-optimized and contained more non-optimal codons than FibHsp. Synonymous mutations in GP64sp sufficiently increased its average codon usage frequency and resulted in dramatic reduction of the activity and secretion of luciferase. Protein degradation inhibition assay with MG-132 showed that higher codon usage frequency in the signal sequence increased the production as well as the degradation of luciferase protein, indicating that the synonymous codon substitutions in the signal sequence caused misfolding of luciferase instead of slowing down the protein production. Meanwhile, we found that introduction of more non-optimal codons into FibHsp could increase the production and secretion levels of luciferase, which suggested a new strategy to improve the production of secretory proteins in insect cells.

Collaboration


Dive into the Shiheng Tao's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge