Guohui Lin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guohui Lin is active.

Explore More

Publication

Featured researches published by Guohui Lin.

Information Processing Letters | 1999

Steiner tree problem with minimum number of Steiner points and bounded edge-length

Guohui Lin; Guoliang Xue

Abstract In this paper, we study the Steiner tree problem with minimum number of Steiner points and bounded edge-length (STPMSPBEL), which asks for a tree interconnecting a given set of n terminal points and a minimum number of Steiner points such that the Euclidean length of each edge is no more than a given positive constant. This problem has applications in VLSI design, WDM optimal networks and wireless communications. We prove that this problem is NP-complete and present a polynomial time approximation algorithm whose worst-case performance ratio is 5.

Nucleic Acids Research | 2008

CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data

David S. Wishart; David Arndt; Mark V. Berjanskii; Peter Tang; Jianjun Zhou; Guohui Lin

CS23D (chemical shift to 3D structure) is a web server for rapidly generating accurate 3D protein structures using only assigned nuclear magnetic resonance (NMR) chemical shifts and sequence data as input. Unlike conventional NMR methods, CS23D requires no NOE and/or J-coupling data to perform its calculations. CS23D accepts chemical shift files in either SHIFTY or BMRB formats, and produces a set of PDB coordinates for the protein in about 10–15 min. CS23D uses a pipeline of several preexisting programs or servers to calculate the actual protein structure. Depending on the sequence similarity (or lack thereof) CS23D uses either (i) maximal subfragment assembly (a form of homology modeling), (ii) chemical shift threading or (iii) shift-aided de novo structure prediction (via Rosetta) followed by chemical shift refinement to generate and/or refine protein coordinates. Tests conducted on more than 100 proteins from the BioMagResBank indicate that CS23D converges (i.e. finds a solution) for >95% of protein queries. These chemical shift generated structures were found to be within 0.2–2.8 Å RMSD of the NMR structure generated using conventional NOE-base NMR methods or conventional X-ray methods. The performance of CS23D is dependent on the completeness of the chemical shift assignments and the similarity of the query protein to known 3D folds. CS23D is accessible at http://www.cs23d.ca.

Journal of Computational Biology | 2002

A General Edit Distance between RNA Structures

Tao Jiang; Guohui Lin; Bin Ma; Kaizhong Zhang

Arc-annotated sequences are useful in representing the structural information of RNA sequences. In general, RNA secondary and tertiary structures can be represented as a set of nested arcs and a set of crossing arcs, respectively. Since RNA functions are largely determined by molecular confirmation and therefore secondary and tertiary structures, the comparison between RNA secondary and tertiary structures has received much attention recently. In this paper, we propose the notion of edit distance to measure the similarity between two RNA secondary and tertiary structures, by incorporating various edit operations performed on both bases and arcs (i.e., base-pairs). Several algorithms are presented to compute the edit distance between two RNA sequences with various arc structures and under various score schemes, either exactly or approximately, with provably good performance. Preliminary experimental tests confirm that our definition of edit distance and the computation model are among the most reasonable ones ever studied in the literature.

Journal of Global Optimization | 2000

Approximations for Steiner Trees with Minimum Number of Steiner Points

Donghui Chen; Ding-Zhu Du; Xiao Dong Hu; Guohui Lin; Lusheng Wang; Guoliang Xue

Given n terminals in the Euclidean plane and a positive constant, find a Steiner tree interconnecting all terminals with the minimum number of Steiner points such that the Euclidean length of each edge is no more than the given positive constant. This problem is NP-hard with applications in VLSI design, WDM optical networks and wireless communications. In this paper, we show that (a) the Steiner ratio is 1/ 4, that is, the minimum spanning tree yields a polynomial-time approximation with performance ratio exactly 4, (b) there exists a polynomial-time approximation with performance ratio 3, and (c) there exists a polynomial-time approxi-mation scheme under certain conditions.

BMC Bioinformatics | 2006

A stable gene selection in microarray data analysis.

Kun Juh Yang; Zhipeng Cai; Guohui Lin

BackgroundMicroarray data analysis is notorious for involving a huge number of genes compared to a relatively small number of samples. Gene selection is to detect the most significantly differentially expressed genes under different conditions, and it has been a central research focus. In general, a better gene selection method can improve the performance of classification significantly. One of the difficulties in gene selection is that the numbers of samples under different conditions vary a lot.ResultsTwo novel gene selection methods are proposed in this paper, which are not affected by the unbalanced sample class sizes and do not assume any explicit statistical model on the gene expression values. They were evaluated on eight publicly available microarray datasets, using leave-one-out cross-validation and 5-fold cross-validation. The performance is measured by the classification accuracies using the top ranked genes based on the training datasets.ConclusionThe experimental results showed that the proposed gene selection methods are efficient, effective, and robust in identifying differentially expressed genes. Adopting the existing SVM-based and KNN-based classifiers, the selected genes by our proposed methods in general give more accurate classification results, typically when the sample class sizes in the training dataset are unbalanced.

Analytical Chemistry | 2013

MyCompoundID: using an evidence-based metabolome library for metabolite identification.

Liang Li; Ronghong Li; Jianjun Zhou; Azeret Zuniga; Avalyn Stanislaus; Yiman Wu; Tao Huan; Jiamin Zheng; Yi Shi; David S. Wishart; Guohui Lin

Identification of unknown metabolites is a major challenge in metabolomics. Without the identities of the metabolites, the metabolome data generated from a biological sample cannot be readily linked with the proteomic and genomic information for studies in systems biology and medicine. We have developed a web-based metabolite identification tool ( http://www.mycompoundid.org ) that allows searching and interpreting mass spectrometry (MS) data against a newly constructed metabolome library composed of 8,021 known human endogenous metabolites and their predicted metabolic products (375,809 compounds from one metabolic reaction and 10,583,901 from two reactions). As an example, in the analysis of a simple extract of human urine or plasma and the whole human urine by liquid chromatography-mass spectrometry and MS/MS, we are able to identify at least two times more metabolites in these samples than by using a standard human metabolome library. In addition, it is shown that the evidence-based metabolome library (EML) provides a much superior performance in identifying putative metabolites from a human urine sample, compared to the use of the ChemPub and KEGG libraries.

Theoretical Computer Science | 2001

Approximations for Steiner trees with minimum number of Steiner points

Donghui Chen; Ding-Zhu Du; Xiao Dong Hu; Guohui Lin; Lusheng Wang; Guoliang Xue

Abstract Given n terminals in the Euclidean plane and a positive constant, find a Steiner tree interconnecting all terminals with the minimum number of Steiner points such that the Euclidean length of each edge is no more than the given positive constant. This problem is NP-hard with applications in VLSI design, WDM optical networks and wireless communications. In this paper, we show that (a) the Steiner ratio is 1 4 , that is, the minimum spanning tree yields a polynomial-time approximation with performance ratio exactly 4, (b) there exists a polynomial-time approximation with performance ratio 3, and (c) there exists a polynomial-time approximation scheme under certain conditions.

international symposium on algorithms and computation | 2000

Phylogenetic k-Root and Steiner k-Root

Guohui Lin; Tao Jiang; Paul E. Kearney

Given a graph G = (V, E) and a positive integer k, the PHYLOGENETIC k-ROOT PROBLEM asks for a (unrooted) tree T without degree-2 nodes such that its leaves are labeled by V and (u, v) ∈ E if and only if dT (u, v) ≤ k. If the vertices in V are also allowed to be internal nodes in T, then we have the Steiner k-ROOT PROBLEM. Moreover, if a particular subset S of V are required to be internal nodes in T, then we have the RESTRICTED STEINER k-ROOT PROBLEM. Phylogenetic k-roots and Steiner k-roots extend the standard notion of GRAPH ROOTS and are motivated by applications in computational biology. In this paper, we first present O(n + e)-time algorithms to determine if a (not necessarily connected) graph G = (V, E) has an S-restricted 1-root Steiner tree for a given subset S ⊂ V , and to determine if a connected graph G = (V, E) has an S-restricted 2-root Steiner tree for a given subset S ⊂ V, where n = |V| and e = |E|. We then use these two algorithms as subroutines to design O(n + e)-time algorithms to determine if a given (not necessarily connected) graph G = (V, E) has a 3-root phylogeny and to determine if a given connected graph G = (V, E) has a 4-root phylogeny.

BMC Genomics | 2008

A first generation whole genome RH map of the river buffalo with comparison to domestic cattle

M. Elisabete J. Amaral; Jason R. Grant; Penny K. Riggs; N. B. Stafuzza; Edson Almeida Filho; Tom Goldammer; Rosemarie Weikard; Ronald M. Brunner; Kelli J. Kochan; Anthony J Greco; Jooha Jeong; Zhipeng Cai; Guohui Lin; Aparna Prasad; Satish Kumar; G Pardha Saradhi; Boby Mathew; M Aravind Kumar; Melissa N Miziara; Paola Mariani; Alexandre R Caetano; Stephan R Galvão; M. S. Tantia; R. K. Vijh; Bina Mishra; S T Bharani Kumar; Vanderlei A Pelai; André M. Santana; Larissa Fornitano; Brittany C Jones

BackgroundThe recently constructed river buffalo whole-genome radiation hybrid panel (BBURH5000) has already been used to generate preliminary radiation hybrid (RH) maps for several chromosomes, and buffalo-bovine comparative chromosome maps have been constructed. Here, we present the first-generation whole genome RH map (WG-RH) of the river buffalo generated from cattle-derived markers. The RH maps aligned to bovine genome sequence assembly Btau_4.0, providing valuable comparative mapping information for both species.ResultsA total of 3990 markers were typed on the BBURH5000 panel, of which 3072 were cattle derived SNPs. The remaining 918 were classified as cattle sequence tagged site (STS), including coding genes, ESTs, and microsatellites. Average retention frequency per chromosome was 27.3% calculated with 3093 scorable markers distributed in 43 linkage groups covering all autosomes (24) and the X chromosomes at a LOD ≥ 8. The estimated total length of the WG-RH map is 36,933 cR5000. Fewer than 15% of the markers (472) could not be placed within any linkage group at a LOD score ≥ 8. Linkage group order for each chromosome was determined by incorporation of markers previously assigned by FISH and by alignment with the bovine genome sequence assembly (Btau_4.0).ConclusionWe obtained radiation hybrid chromosome maps for the entire river buffalo genome based on cattle-derived markers. The alignments of our RH maps to the current bovine genome sequence assembly (Btau_4.0) indicate regions of possible rearrangements between the chromosomes of both species. The river buffalo represents an important agricultural species whose genetic improvement has lagged behind other species due to limited prior genomic characterization. We present the first-generation RH map which provides a more extensive resource for positional candidate cloning of genes associated with complex traits and also for large-scale physical mapping of the river buffalo genome.

combinatorial pattern matching | 2000

The Longest Common Subsequence Problem for Arc-Annotated Sequences

Tao Jiang; Guohui Lin; Bin Ma; Kaizhong Zhang

Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. Recently, the longest arc-preserving common subsequence problem has been introduced in as a framework for studying the similarity of arc-annotated sequences. In this paper, we consider arc-annotated sequences with various arc structures and present some new algorithmic and complexity results on the longest arc-preserving common subsequence problem. Some of our results answer an open question in [6,7] and some others improve the hardness results in [6,7].

Explore More