Jingwu He | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jingwu He is active.

Explore More

Publication

Featured researches published by Jingwu He.

Bioinformatics | 2006

MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression

Jingwu He; Alexander Zelikovsky

UNLABELLED The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs accurately representing the rest of the SNPs. Informative SNP selection can achieve (1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or (2) necessary reduction of the huge SNP sets (obtained, e.g. from Affymetrix) for further fine haplotype analysis. A novel informative SNP selection method for unphased genotype data based on multiple linear regression (MLR) is implemented in the software package MLR-tagging. This software can be used for informative SNP (tag) selection and genotype prediction. The stepwise tag selection algorithm (STSA) selects positions of the given number of informative SNPs based on a genotype sample population. The MLR SNP prediction algorithm predicts a complete genotype based on the values of its informative SNPs, their positions among all SNPs, and a sample of complete genotypes. An extensive experimental study on various datasets including 10 regions from HapMap shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin et al. (2005). AVAILABILITY MLR-Tagging software package is publicly available at http://alla.cs.gsu.edu/~software/tagging/tagging.html

international conference of the ieee engineering in medicine and biology society | 2006

Combinatorial Search Methods for Multi-SNP Disease Association

Dumitru Brinza; Jingwu He; Alexander Zelikovsky

Recent improvements in the accessibility of high-throughput genotyping have brought a deal of attention to genome-wide association studies for common complex diseases. Although, such diseases can be caused by multi-loci interactions, locus-by-locus studies are prevailing. Recently, two-loci analysis has been shown promising (Marchini et al, 2005), and multi-loci analysis is expected to find even deeper disease-associated interactions. Unfortunately, an exhaustive search among all possible corresponding multi-markers can be unfeasible even for small number of SNPs let alone the complete genome. In this paper we first propose to extract informative (indexing) SNPs that can be used for reconstructing of all SNPs almost without loss (He and Zelikovsky, 2006). In the reduced set of SNPs, we then propose to apply a novel combinatorial method for finding disease-associated multi-SNP combinations (MSCs). Our experimental study shows that the proposed methods are able to find MSCs whose disease association is statistically significant even after multiple testing adjustment. For (Daly et al, 2001) data we found a few unphased MSCs associated with Crohns disease with multiple testing adjusted p-value below 0.05 while no single SNP or pair of SNPs show any significant association. For (Ueda et al, 2003) data we found a few new unphased and phased MSCs associated with autoimmune disorder

IEEE Transactions on Nanobioscience | 2007

Informative SNP Selection Methods Based on SNP Prediction

Jingwu He; Alexander Zelikovsky

The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs, i.e., tag SNPs, accurately representing the rest of the SNPs. Tag SNP selection can achieve: 1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or 2) necessary reduction of the huge SNP sets (obtained, e.g., from Affymetrix) for further fine haplotype analysis. In this paper, we show that the tag SNP selection strongly depends on how the chosen tags will be used-advantage of one tag set over another can only be considered with respect to a certain prediction method. We show how to separate tag selection from SNP prediction and propose greedy and local-minimization algorithms for tag SNP selection. We give two novel approaches to SNP prediction based on multiple linear regression (MLR) and support vector machines (SVMs). An extensive experimental study on various datasets including ten regions from hapMap project shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin The MLR-based method also uses on average 30% fewer tags than IdSelect for statistical covering all SNPs. The tag selection based on SVM SNP prediction uses fewer tags to achieve the same prediction accuracy as the methods of Halldorsson

international conference on computational science | 2006

Tag SNP selection based on multivariate linear regression

Jingwu He; Alexander Zelikovsky

The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has been recently received great attention. For these studies, it is essential to use a small subset of informative SNPs (tag SNPs) accurately representing the rest of the SNPs. Tagging can achieve budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs and compaction of extremely long SNP sequences (obtained, e.g., from Affimetrix Map Array) for further fine genotype analysis. Tagging should first choose tags from the SNPs under consideration and then knowing the values of chosen tag SNPs predict (or statistically cover) the non-tag SNPs. In this paper we propose a new SNP prediction method based on rounding of multivariate linear regression (MLR) analysis in sigma-restricted coding. When predicting a non-tag SNP, the MLR method accumulates information about all tag SNPs resulting in significantly higher prediction accuracy with the same number of tags than for the previously known tagging methods. We also show that the tag selection strongly depends on how the chosen tags will be used – advantage of one tag set over another can only be considered with respect to a certain prediction method. Two simple universal tag selection methods have been applied: a (faster) stepwise and a (slower) local-minimization tag selection algorithms. An extensive experimental study on various datasets including 6 regions from HapMap shows that the MLR prediction combined with stepwise tag selection uses significantly fewer tags (e.g., up to two times less tags to reach 90% prediction accuracy) than the state-of-art methods of Halperin et al. [8] for genotypes and Halldorsson et al.[7] for haplotypes, respectively. Our stepwise tagging matches the quality of while being faster than STAMPA [8]. The code is publicly available at http://alla.cs.gsu.edu/~software.

International Journal of Bioinformatics Research and Applications | 2005

Linear reduction method for predictive and informative tag SNP selection

Jingwu He; Kelly Westbrooks; Alexander Zelikovsky

Constructing a complete human haplotype map is helpful when associating complex diseases with their related SNPs. Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. In this paper, we propose a new linear algebra-based method for selecting and using tag SNPs. We measure the quality of our tag SNP selection algorithm by comparing actual SNPs with SNPs predicted from selected linearly independent tag SNPs. Our experiments show that for sufficiently long haplotypes, knowing only 0.4% of all SNPs the proposed linear reduction method predicts an unknown haplotype with the error rate below 2% based on 10% of the population.

workshop on algorithms in bioinformatics | 2004

Linear reduction for haplotype inference

Jingwu He; Alexander Zelikovsky

Haplotype inference problem asks for a set of haplotypes explaining a given set of genotypes. Popular software tools for haplotype inference (e.g., PHASE, HAPLOTYPER) as well as new algorithms recently proposed for perfect phylogeny inference (DPPH) are often not well scalable. When the number of sites (SNP’s) comes to thousands these tools often cannot deliver answer in reasonable time even if the number of haplotypes is small. In this paper we propose a new linear algebra based method which drastically reduces the number of sites in the original data. After solving a reduced instance, linear decoding allows to recover haplotypes of full length for given genotypes. Experiments show that our method significantly speeds up popular haplotype inference tools while finding almost the same solution practically in all cases thus not compromising the quality of the known haplotype inference methods.

international conference of the ieee engineering in medicine and biology society | 2005

A Combinatorial Method for Predicting Genetic Susceptibility to Complex Diseases

Weidong Mao; Jingwu He; Dumitru Brinza; Alexander Zelikovsky

Recent improvements in the accessibility of high-throughput genotyping have brought a great deal of attention to disease association and susceptibility studies. This paper explores possibility of applying combinatorial methods to disease susceptibility prediction. The proposed combinatorial methods as well as standard statistical methods are applied to publicly available genotype data on Crohns disease and autoimmune disorders for predicting susceptibility to these diseases. The quality of susceptibility prediction algorithm is assessed using leave-one-out and leave-many-out tests - the disease status of one or several individuals is predicted and compared to the their actual disease status which is initially made unknown to the algorithm. The best prediction rate achieved by the proposed algorithms is 77.78% for Crohns disease and 64.99% for autoimmune disorders, respectively

international conference of the ieee engineering in medicine and biology society | 2004

Linear reduction methods for tag SNP selection

Jingwu He; Alexander Zelikovsky

It is widely hoped that constructing a complete human haplotype map will help to associate complex diseases with certain SNPs. Unfortunately, the number of SNPs is huge and it is very costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to considerably small number of informative representatives, so called tag SNPs. In this paper, we propose a new linear algebra based method for selecting and using tag SNPs. Our method is purely combinatorial and can be combined with linkage disequilibrium (LD) and block based methods. We measure the quality of our tag SNP selection algorithm by comparing actual SNPs with SNPs linearly predicted from linearly chosen tag SNPs. We obtain an extremely good compression and prediction rates. For example, for long haplotypes (>25000 SNPs), knowing only 0.4% of all SNPs we predict the entire unknown haplotype with 2% accuracy while the prediction method is based on a 10% sample of the population.

granular computing | 2006

Haplotype tagging using support vector machines

Jingwu He; Jun Zhang; Gulsah Altun; Alexander Zelikovsky; Yan-Qing Zhang

Constructing a complete human haplotype map can help in associating complex diseases with SNPs (single nucleotide polymorphisms). Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. Depending on the application, tagging can achieve either budget savings by inferring non-tag SNPs from tag SNPs or shortening lengthy and difficult to handle SNP sequences obtained from Affimetrix Map Array. Tagging should first choose which SNPs to use as tags and then predict the unknown non-tag SNPs from the known tags. In this paper we propose a new SNP prediction using a robust tool for classification - Support Vector Machine (SVM). For tag selection we use a fast stepwise tag selection algorithm. An extensive experimental study on various datasets including 3 regions from HapMap shows that the tag selection based on SVM SNP prediction can reach the same prediction accuracy as the methods of Halldorson et al. (7) on the LPL using significantly fewer tags. For example, our method reaches 90% SNP prediction accuracy using only 3 tags for Daly et al. (6) dataset with 103 SNPs. The proposed tagging method is also more accurate (but considerably slower) than multivariate linear regression method of He et al. (12).

international conference of the ieee engineering in medicine and biology society | 2006

Multiple linear regression for index SNP selection on unphased genotypes.

Jingwu He; Alexander Zelikovsky

The search for the association between complex diseases and single nucleotide polymorphism (SNPs) or haplotypes has recently received great attention. Recent successes in high throughput genotyping technologies drastically increase the length of available SNP sequences. This elevates the importance for the use of a small subset of informative SNPs, called index SNPs, accurately representing the rest of the SNPs (i.e., the rest of the SNPs can be highly predicted from the index SNPs). Index SNP selection achieves the compaction of huge unphased genotype data (obtained, e.g., from Affimetrix Map Array) in order to make feasible fine genotype analysis. In this paper we propose a novel index SNP selection on unphased genotypes based on multiple linear regression (MLR) SNP prediction. We measure the quality of our index SNP selection algorithm by comparing actual SNPs with the SNPs computationally predicted from chosen index SNPs. We obtain an extremely good prediction rates and compression. For example, for region ENm010 (123 SNPs), we can use 2% of SNPs for representing all SNPs with 93.5% accuracy. An experimental study on 4 ENCODE regions from HapMap shows that our method uses significantly fewer index SNPs (e.g., up to two times less index SNPs to reach 90% prediction accuracy) than the state-of-the-art method of Halperin et al. for genotypes

Explore More