Dongsheng Che
East Stroudsburg University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dongsheng Che.
genetic and evolutionary computation conference | 2005
Dongsheng Che; Yinglei Song; Khaled Rasheed
Computationally identifying transcription factor binding sites in the promoter regions of genes is an important problem in computational biology and has been under intensive research for a decade. To predict the binding site locations efficiently, many algorithms that incorporate either approximate or heuristic techniques have been developed. However, the prediction accuracy is not satisfactory and binding site prediction thus remains a challenging problem. In this paper, we develop an approach that can be used to predict binding site motifs using a genetic algorithm. Based on the generic framework of a genetic algorithm, the approach explores the search space of all possible starting locations of the binding site motifs in different target sequences with a population that undergoes evolution. Individuals in the population compete to participate in the crossovers and mutations occur with a certain probability. Initial experiments demonstrated that our approach could achieve high prediction accuracy in a small amount of computation time. A promising advantage of our approach is the fact that the computation time does not explicitly depend on the length of target sequences and hence may not increase significantly when the target sequences become very long.
Advances in Experimental Medicine and Biology | 2011
Dongsheng Che; Qi Liu; Khaled Rasheed; Xiuping Tao
Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
Bioinformation | 2012
Mohammad Shabbir Hasan; Qi Liu; Han Wang; John Fazekas; Bernard Chen; Dongsheng Che
Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. Availability The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST
Nucleic Acids Research | 2006
Dongsheng Che; Guojun Li; Fenglou Mao; Hongwei Wu; Ying Xu
We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: , the first of its kind.
Bioinformation | 2011
Dongsheng Che; Mohammad Shabbir Hasan; Han Wang; John Fazekas; Jinling Huang; Qi Liu
Genomic islands (GIs) are genomic regions that are originally transferred from other organisms. The detection of genomic islands in genomes can lead to many applications in industrial, medical and environmental contexts. Existing computational tools for GI detection suffer either low recall or low precision, thus leaving the room for improvement. In this paper, we report the development of our Ensemble algorithm for Genomic Island Detection (EGID). EGID utilizes the prediction results of existing computational tools, filters and generates consensus prediction results. Performance comparisons between our ensemble algorithm and existing programs have shown that our ensemble algorithm is better than any other program. EGID was implemented in Java, and was compiled and executed on Linux operating systems. EGID is freely available at http://www5.esu.edu/cpsc/bioinfo/software/EGID.
BMC Genomics | 2010
Dongsheng Che; Cory Hockenbury; Robert E. Marmelstein; Khaled Rasheed
BackgroundGenomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory.ResultsIn this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy.ConclusionsWe conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.
Journal of Bioinformatics and Computational Biology | 2009
Guojun Li; Dongsheng Che; Ying Xu
Identification of operons at the genome scale of prokaryotic organisms represents a key step in deciphering of their transcriptional regulation machinery, biological pathways, and networks. While numerous computational methods have been shown to be effective in predicting operons for well-studied organisms such as Escherichia coli K12 and Bacillus subtilis 168, these methods generally do not generalize well to genomes other than the ones used to train the methods, or closely related genomes because they rely on organism-specific information. Several methods have been explored to address this problem through utilizing only genomic structural information conserved across multiple organisms, but they all suffer from the issue of low prediction sensitivity. In this paper, we report a novel operon prediction method that is applicable to any prokaryotic genome with high prediction accuracy. The key idea of the method is to predict operons through identification of conserved gene clusters across multiple genomes and through deriving a key parameter relevant to the distribution of intergenic distances in genomes. We have implemented this method using a graph-theoretic approach, to calculate a set of maximum gene clusters in the target genome that are conserved across multiple reference genomes. Our computational results have shown that this method has higher prediction sensitivity as well as specificity than most of the published methods. We have carried out a preliminary study on operons unique to archaea and bacteria, respectively, and derived a number of interesting new insights about operons between these two kingdoms. The software and predicted operons of 365 prokaryotic genomes are available at http://csbl.bmb.uga.edu/~dongsheng/UNIPOP.
pacific symposium on biocomputing | 2006
Jizhen Zhao; Dongsheng Che; Liming Cai
Template-based comparative analysis is a viable approach to the prediction and annotation of pathways in genomes. Methods based solely on sequence similarity may not be effective enough; functional and structural information such as protein-DNA interactions and operons can prove useful in improving the prediction accuracy. In this paper, we present a novel approach to predicting pathways by seeking high overall sequence similarity, functional and structural consistency between the predicted pathways and their templates. In particular, the prediction problem is formulated into finding the maximum independent set (MIS) in the graph constructed based on operon or interaction structures as well as homologous relationships of the involved genes. On such graphs, the MIS problem is solved efficiently via non-trivial tree decomposition of the graphs. The developed algorithm is evaluated based on the annotation of 40 pathways in Escherichia coli (E. coli) K12 using those in Bacillus subtilis (B. subtilis) 168 as templates. It demonstrates overall accuracy that outperforms those of the methods based solely on sequence similarity or using structural information of the genome with integer programming.
international symposium on bioinformatics research and applications | 2011
Han Wang; John Fazekas; Matthew Booth; Qi Liu; Dongsheng Che
A genomic island (GI) is a segment of genomic sequence that is horizontally transferred from other genomes. The detection of genomic islands is extremely important to the medical research. Most of current computational approaches that use sequence composition to predict genomic islands have the problem of low prediction accuracy. In this paper, we report, for the first time, that gene information and inter-genic distance are different between genomic islands and non-genomic islands. Using these two sources and sequence information, we have trained the genomic island datasets from 113 genomes, and developed a decisiontree based bagging model for genomic island prediction. In order to test the performance our approach, we have applied it on three genomes: Salmonella typhimurium LT2, Streptococcus pyogenes MGAS315, and Escherichia coli O157:H7 str. Sakai. The performance metrics have shown that our approach is better than other sequence composition based approaches. We conclude that the incorporation of gene information and intergenic distance could improve genomic island prediction accuracy. Our prediction software, Genomic Island Hunter (GIHunter), is available at http://www.esu.edu/cpsc/che_lab/software/GIHunter.
computational intelligence in bioinformatics and computational biology | 2007
Dongsheng Che; Jizhen Zhao; Liming Cai; Ying Xu
Identifying operons at the whole genome scale of microbial organisms can facilitate deciphering of transcriptional regulation, biological networks and pathways. A number of computational methods, such as naive Bayesian and neural network approaches, have been employed for operon prediction to whole genome sequences of a number of prokaryotic organisms, based on features known to be associated with operons, such as intergenic distance, microarray expression data, phylogenetic profiles, clusters of orthologous groups (COG). In this paper, we introduce a decision tree approach to predict operon structures using three effective types of genomic data: intergenic distance, gene order conservation and COG. We calculated and analyzed frequency distributions of each attribute of known operons and non-operons of Escherichia coli (E. coli) K12 and Bacillus subtilis (R subtilis) 168, and constructed decision trees based on training examples to predict operons. The overall prediction accuracy is 94.1% for E. coli K12 and 91.0% for B. subtilis 168. We also applied four other classifiers, logistic regression, naive Bayesian, neural network and support vector machines on both organisms. The results indicate that the decision tree approach is the best classifier for operon prediction. The software package operonDT is freely available at http://www.cs.uga.edn/~che/OperonT