Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dongbo Bu is active.

Publication


Featured researches published by Dongbo Bu.


Nucleic Acids Research | 2004

NONCODE: an integrated knowledge database of non-coding RNAs.

Changning Liu; Baoyan Bai; Geir Skogerbø; Lun Cai; Wei Deng; Yong Zhang; Dongbo Bu; Yi-Pei Zhao; Runsheng Chen

NONCODE is an integrated knowledge database dedicated to non-coding RNAs (ncRNAs), that is to say, RNAs that function without being translated into proteins. All ncRNAs in NONCODE were filtered automatically from literature and GenBank, and were later manually curated. The distinctive features of NONCODE are as follows: (i) the ncRNAs in NONCODE include almost all the types of ncRNAs, except transfer RNAs and ribosomal RNAs. (ii) All ncRNA sequences and their related information (e.g. function, cellular role, cellular location, chromosomal information, etc.) in NONCODE have been confirmed manually by consulting relevant literature: more than 80% of the entries are based on experimental data. (iii) Based on the cellular process and function, which a given ncRNA is involved in, we introduced a novel classification system, labeled process function class, to integrate existing classification systems. (iv) In addition, some 1100 ncRNAs have been grouped into nine other classes according to whether they are specific to gender or tissue or associated with tumors and diseases, etc. (v) NONCODE provides a user-friendly interface, a visualization platform and a convenient search option, allowing efficient recovery of sequence, regulatory elements in the flanking sequences, secondary structure, related publications and other information. The first release of NONCODE (v1.0) contains 5339 non-redundant sequences from 861 organisms, including eukaryotes, eubacteria, archaebacteria, virus and viroids. Access is free for all users through a web interface at http://noncode.bioinfo.org.cn.


Protein Science | 2008

Fragment-HMM: A new approach to protein structure prediction

Shuai Cheng Li; Dongbo Bu; Jinbo Xu; Ming Li

We designed a simple position‐specific hidden Markov model to predict protein structure. Our new framework naturally repeats itself to converge to a final target, conglomerating fragment assembly, clustering, target selection, refinement, and consensus, all in one process. Our initial implementation of this theory converges to within 6 Å of the native structures for 100% of decoys on all six standard benchmark proteins used in ROSETTA (discussed by Simons and colleagues in a recent paper), which achieved only 14%–94% for the same data. The qualities of the best decoys and the final decoys our theory converges to are also notably better.


Trends in Genetics | 2008

MicroRNA regulation of messenger-like noncoding RNAs: a network of mutual microRNA control

Yi Zhao; Shunmin He; Changning Liu; Songwei Ru; Haitao Zhao; Zhen Yang; Pengcheng Yang; Xiongyin Yuan; Shiwei Sun; Dongbo Bu; Jiefu Huang; Geir Skogerbø; Runsheng Chen

Metazoan microRNAs (miRNAs) are commonly encoded by primary mRNA-like characteristics (mlRNAs). To investigate whether mlRNAs are subject to miRNA control, we compared the expression of mlRNAs to that of tissue-specific miRNAs. We show that, like mRNAs, the expression levels of predicted mlRNA targets are significantly reduced in tissues where a targeting miRNA is expressed. On the basis of these results, we describe a potential network for posttranscriptional miRNA-miRNA control.


BMC Structural Biology | 2009

Improving consensus contact prediction via server correlation reduction

Xin Gao; Dongbo Bu; Jinbo Xu; Ming Li

BackgroundProtein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them.ResultsIn this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively.ConclusionReducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.


Bioinformatics | 2008

FlexStem: improving predictions of RNA secondary structures with pseudoknots by reducing the search space

Xiang Chen; Simin He; Dongbo Bu; Fa Zhang; Zhiyong Wang; Runsheng Chen; Wen Gao

MOTIVATION RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is proved to be NP-hard. Due to kinetic reasons the real RNA secondary structure often has local instead of global minimum free energy. This implies that we may improve the performance of RNA secondary structure prediction by taking kinetics into account and minimize free energy in a local area. RESULT we propose a novel algorithm named FlexStem to predict RNA secondary structures with pseudoknots. Still based on MFE criterion, FlexStem adopts comprehensive energy models that allow complex pseudoknots. Unlike classical thermodynamic methods, our approach aims to simulate the RNA folding process by successive addition of maximal stems, reducing the search space while maintaining or even improving the prediction accuracy. This reduced space is constructed by our maximal stem strategy and stem-adding rule induced from elaborate statistical experiments on real RNA secondary structures. The strategy and the rule also reflect the folding characteristic of RNA from a new angle and help compensate for the deficiency of merely relying on MFE in RNA structure prediction. We validate FlexStem by applying it to tRNAs, 5SrRNAs and a large number of pseudoknotted structures and compare it with the well-known algorithms such as RNAfold, PKNOTS, PknotsRG, HotKnots and ILM according to their overall sensitivities and specificities, as well as positive and negative controls on pseudoknots. The results show that FlexStem significantly increases the prediction accuracy through its local search strategy. AVAILABILITY Software is available at http://pfind.ict.ac.cn/FlexStem/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


FEBS Letters | 2006

Faster and more accurate global protein function assignment from protein interaction networks using the MFGO algorithm

Shiwei Sun; Yi Zhao; Yishan Jiao; Yifei Yin; Lun Cai; Yong Zhang; Hongchao Lu; Runsheng Chen; Dongbo Bu

On four proteins interaction datasets, including Vazquez dataset, YP dataset, DIP‐core dataset, and SPK dataset, MFGO was tested and compared with the popular MR (majority rule) and GOM methods. Experimental results confirm MFGOs improvement on both speed and accuracy. Especially, MFGO method has a distinctive advantage in accurately predicting functions for proteins with few neighbors. Moreover, the robustness of the approach was validated both in a dataset containing a high percentage of unknown proteins and a disturbed dataset through random insertion and deletion. The analysis shows that a moderate amount of misplaced interactions do not preclude a reliable function assignment.


BMC Infectious Diseases | 2004

Date of origin of the SARS coronavirus strains

Hongchao Lu; Yi Zhao; Jingfen Zhang; Yuelan Wang; Wei Li; Xiaopeng Zhu; Shiwei Sun; Jingyi Xu; Lunjiang Ling; Lun Cai; Dongbo Bu; Runsheng Chen

BackgroundA new respiratory infectious epidemic, severe acute respiratory syndrome (SARS), broke out and spread throughout the world. By now the putative pathogen of SARS has been identified as a new coronavirus, a single positive-strand RNA virus. RNA viruses commonly have a high rate of genetic mutation. It is therefore important to know the mutation rate of the SARS coronavirus as it spreads through the population. Moreover, finding a date for the last common ancestor of SARS coronavirus strains would be useful for understanding the circumstances surrounding the emergence of the SARS pandemic and the rate at which SARS coronavirus diverge.MethodsWe propose a mathematical model to estimate the evolution rate of the SARS coronavirus genome and the time of the last common ancestor of the sequenced SARS strains. Under some common assumptions and justifiable simplifications, a few simple equations incorporating the evolution rate (K) and time of the last common ancestor of the strains (T0) can be deduced. We then implemented the least square method to estimate K and T0 from the dataset of sequences and corresponding times. Monte Carlo stimulation was employed to discuss the results.ResultsBased on 6 strains with accurate dates of host death, we estimated the time of the last common ancestor to be about August or September 2002, and the evolution rate to be about 0.16 base/day, that is, the SARS coronavirus would on average change a base every seven days. We validated our method by dividing the strains into two groups, which coincided with the results from comparative genomics.ConclusionThe applied method is simple to implement and avoid the difficulty and subjectivity of choosing the root of phylogenetic tree. Based on 6 strains with accurate date of host death, we estimated a time of the last common ancestor, which is coincident with epidemic investigations, and an evolution rate in the same range as that reported for the HIV-1 virus.


combinatorial pattern matching | 2008

Finding Largest Well-Predicted Subset of Protein Structure Models

Shuai Cheng Li; Dongbo Bu; Jinbo Xu; Ming Li

How to evaluate the quality of models is a basic problem for the field of protein structure prediction. Numerous evaluation criteria have been proposed, and one of the most intuitive criteria requires us to find a largest well-predicted subset-- a maximum subset of the model which matches the native structure [12]. The problem is solvable in O(n7) time, albeit too slow for practical usage. We present a (1 + i¾?)ddistance approximation algorithm that runs in time O(n3logn/i¾?5) for general protein structures. In the case of globular proteins, this result can be enhanced to a randomized O(nlog2n) time algorithm with probability at least 1 i¾? O(1/n). In addition, we propose a (1 + i¾?)-approximation algorithm to compute the minimum distance to fit all the points of a model to its native structure in time O(n(loglogn+ log1/i¾?)/i¾?5). We have implemented our algorithms and results indicate our program finds much more matched pairs with less running time than TMScore, which is one of the most popular tools to assess the quality of predicted models.


Bioinformatics | 2016

FALCON@home: a high-throughput protein structure prediction server based on remote homologue recognition

Chao Wang; Haicang Zhang; Wei-Mou Zheng; Dong Xu; Jianwei Zhu; Bing Wang; Kang Ning; Shiwei Sun; Shuai Cheng Li; Dongbo Bu

SUMMARY The protein structure prediction approaches can be categorized into template-based modeling (including homology modeling and threading) and free modeling. However, the existing threading tools perform poorly on remote homologous proteins. Thus, improving fold recognition for remote homologous proteins remains a challenge. Besides, the proteome-wide structure prediction poses another challenge of increasing prediction throughput. In this study, we presented FALCON@home as a protein structure prediction server focusing on remote homologue identification. The design of FALCON@home is based on the observation that a structural template, especially for remote homologous proteins, consists of conserved regions interweaved with highly variable regions. The highly variable regions lead to vague alignments in threading approaches. Thus, FALCON@home first extracts conserved regions from each template and then aligns a query protein with conserved regions only rather than the full-length template directly. This helps avoid the vague alignments rooted in highly variable regions, improving remote homologue identification. We implemented FALCON@home using the Berkeley Open Infrastructure of Network Computing (BOINC) volunteer computing protocol. With computation power donated from over 20,000 volunteer CPUs, FALCON@home shows a throughput as high as processing of over 1000 proteins per day. In the Critical Assessment of protein Structure Prediction (CASP11), the FALCON@home-based prediction was ranked the 12th in the template-based modeling category. As an application, the structures of 880 mouse mitochondria proteins were predicted, which revealed the significant correlation between protein half-lives and protein structural factors. AVAILABILITY AND IMPLEMENTATION FALCON@home is freely available at http://protein.ict.ac.cn/FALCON/. CONTACT [email protected], [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


research in computational molecular biology | 2008

A fragmentation event model for peptide identification by mass spectrometry

Yu Lin; Yantao Qiao; Shiwei Sun; Chungong Yu; Gongjin Dong; Dongbo Bu

We present in this paper a novel fragmentation event model for peptide identification by tandem mass spectrometry. Most current peptide identification techniques suffer from the inaccuracies in the predicted theoretical spectrum, which is due to insufficient understanding of the ion generation process, especially the b/y ratio puzzle. To overcome this difficulty, we propose a novel fragmentation event model, which is based on the abundance of fragmentation events rather than ion intensities. Experimental results demonstrate that this model helps improve database searching methods. On LTQ data set, when we control the false-positive rate to be 5%, our fragmentation event model has a significantly higher true positive rate (0.83) than SEQUEST (0.73). Comparison with Mascot exhibits similar results, which means that our model can effectively identify the false positive peptide-spectrum pairs reported by SEQUEST and Mascot. This fragmentation event model can also be used to solve the problem of missing peak encountered by De Novo methods. To our knowledge, this is the first time the fragmentation preference for peptide bonds is used to overcome the missing-peak difficulty.

Collaboration


Dive into the Dongbo Bu's collaboration.

Top Co-Authors

Avatar

Shiwei Sun

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Shuai Cheng Li

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Runsheng Chen

Peking Union Medical College Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yi Zhao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Lun Cai

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jingfen Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Wei-Mou Zheng

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xiaopeng Zhu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jinbo Xu

Toyota Technological Institute at Chicago

View shared research outputs
Researchain Logo
Decentralizing Knowledge