Shuanhu Wu
City University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shuanhu Wu.
international conference of the ieee engineering in medicine and biology society | 2004
Shuanhu Wu; Alan Wee-Chung Liew; Hong Yan; Mengsu Yang
Cluster analysis of gene expression data from a cDNA microarray is useful for identifying biologically relevant groups of genes. However, finding the natural clusters in the data and estimating the correct number of clusters are still two largely unsolved problems. In this paper, we propose a new clustering framework that is able to address both these problems. By using the one-prototype-take-one-cluster (OPTOC) competitive learning paradigm, the proposed algorithm can find natural clusters in the input data, and the clustering solution is not sensitive to initialization. In order to estimate the number of distinct clusters in the data, we propose a cluster splitting and merging strategy. We have applied the new algorithm to simulated gene expression data for which the correct distribution of genes over clusters is known a priori. The results show that the proposed algorithm can find natural clusters and give the correct number of clusters. The algorithm has also been tested on real gene expression changes during yeast cell cycle, for which the fundamental patterns of gene expression and assignment of genes to clusters are well understood from numerous previous studies. Comparative studies with several clustering algorithms illustrate the effectiveness of our method.
BMC Bioinformatics | 2007
Alan Wee-Chung Liew; Jun Xian; Shuanhu Wu; David K. Smith; Hong Yan
BackgroundPeriodogram analysis of time-series is widespread in biology. A new challenge for analyzing the microarray time series data is to identify genes that are periodically expressed. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, and unevenly sampled time points. Most methods used in the literature operate on evenly sampled time series and are not suitable for unevenly sampled time series.ResultsFor evenly sampled data, methods based on the classical Fourier periodogram are often used to detect periodically expressed gene. Recently, the Lomb-Scargle algorithm has been applied to unevenly sampled gene expression data for spectral estimation. However, since the Lomb-Scargle method assumes that there is a single stationary sinusoid wave with infinite support, it introduces spurious periodic components in the periodogram for data with a finite length. In this paper, we propose a new spectral estimation algorithm for unevenly sampled gene expression data. The new method is based on signal reconstruction in a shift-invariant signal space, where a direct spectral estimation procedure is developed using the B-spline basis. Experiments on simulated noisy gene expression profiles show that our algorithm is superior to the Lomb-Scargle algorithm and the classical Fourier periodogram based method in detecting periodically expressed genes. We have applied our algorithm to the Plasmodium falciparum and Yeast gene expression data and the results show that the algorithm is able to detect biologically meaningful periodically expressed genes.ConclusionWe have proposed an effective method for identifying periodic genes in unevenly sampled space of microarray time series gene expression data. The method can also be used as an effective tool for gene expression time series interpolation or resampling.
International Journal of Bioinformatics Research and Applications | 2008
Liping Du; Shuanhu Wu; Alan Wee-Chung Liew; David K. Smith; Hong Yan
We propose a new strategy to analyse the periodicity of gene expression profiles using Singular Spectrum Analysis (SSA) and Autoregressive (AR) model based spectral estimation. By combining the advantages of SSA and AR modelling, more periodic genes are extracted in the Plasmodium falciparum data set, compared with the classical Fourier analysis technique. We are able to identify more gene targets for new drug discovery, and by checking against the seven well-known malaria vaccine candidates, we have found five additional genes that warrant further biological verification.
pacific rim conference on multimedia | 2003
Alan Wee-Chung Liew; Hong Yan; Shuanhu Wu
Cluster analysis of gene expression data is useful for identifying biologically relevant groups of genes. However, finding the correct clusters in the data and estimating the correct number of clusters are still two largely unsolved problems. In this paper, we propose a new clustering framework that is able to address both these problems. By using the one-prototype-take-one-cluster (OPTOC) competitive learning paradigm, the proposed algorithm can find natural clusters in the input data, and the clustering solution is not sensitive to initialization. In order to estimate the number of distinct clusters in the data, an over-clustering and merging strategy is proposed. For validation, we applied the new algorithm to both simulated gene expression data and real gene expression data (expression changes during yeast cell cycle). The results clearly indicate the effectiveness of our method.
international workshop on advanced computational intelligence | 2011
Rongxin Fang; Shuanhu Wu; Wenyan Zhang; Qicheng Liu; Yibin Song
In this paper, an effective promoter identification algorithm is proposed. This new algorithm is based on the following features of promoters: (I) Promoter regions include some binding sites where RNA polymerase II binds to and also where transcription starts. These binding sites include core-promoter, like TATA-box, GC-box, i.e. However, spacing structure of binding sites is not always consistent, the same kind of binding sites in promoter regions often differ in structure because of nucleotide variation. (II) Positions of binding sites in the gene are not fixed, instead, their positions are actually more likely to fluctuate in an approximate region. Based on above two features of promoters, firstly, we overlook differences in structure of binding sites caused by nucleotide variation. In another word, Those binding motifs, with similarity in strucuture but appearing in different forms caused by nucleotide variation, are seen as one binding motif. Secondly, we divide promoter regions into equal-length intervals and calculate occurring probability of binding sites in each interval. It is the first time for us to present a new concept “Interval Weight Matrix (IWM)” to reflect relationship between interval and occurring probability of binding sites. Then a new promoter identification system is proposed. After testing on large sequences and comparing with other well-known systems, it is proved that our new algorithm performs much better in reducing false positives(FP) than other well-kbown systems.
fuzzy systems and knowledge discovery | 2011
Shuanhu Wu; Wenyan Zhang; Qicheng Liu; Yibin Song; Chuangcun Wang
In this paper, we present a new modeling strategy for the recognition and prediction of promoter region. In our model, we base on following considerations: (1) promoter region comprises a number of binding sites (consensus sequences) that RNA polymerase II can bind to and start the transcription of gene, different promoter can be determined by a combination of different binding sites; (2) the spacing of these binding sites is not always consistent and there is some nucleotide variation in some position in different genes and species. Based on above considerations, we first split promoter region into equal intervals and calculate the occurring probability for each words that is assumed to be the sequences of binding sites in each interval by training sets respectively. Here we combined those interval probabilities into one matrix and refer it to as Interval Position Weight Matrix (IPWM); then a new promoter modeling strategy and feature abstracting method are introduced based on maximal probability model and IPWM. The results of testing on large genomic sequences and comparisons with several currently famous algorithms show that our algorithm is efficient with higher sensitivity and specificity.
asia-pacific bioinformatics conference | 2007
Xudong Xie; Shuanhu Wu; Kin-Man Lam; Hong Yan
In this paper, an effective promoter detection algorithm, which is called PromoterExplorer, is proposed. In our approach, various features, i.e. local distribution of pentamers, positional CpG island features and digitized DNA sequence, are combined to build a high-dimensional input vector. A cascade AdaBoost based learning procedure is adopted to select the most “informative” or “discriminating” features to build a sequence of weak classifiers. A number of weak classifiers construct a strong classifier, which can achieve a better performance. In order to reduce the false positive, a cascade structure is used for detection. PromoterExplorer is tested based on large-scale DNA sequences from different databases, including EPD, Genbank and human chromosome 22. The proposed method consistently outperforms PromoterInspector and Dragon Promoter Finder.
Lecture Notes in Computer Science | 2006
Liping Du; Shuanhu Wu; Alan Wee-Chung Liew; David K. Smith; Hong Yan
Spectral analysis of DNA microarray gene expressions time series data is important for understanding the regulation of gene expression and gene function of the Plasmodium falciparum in the intraerythrocytic developmental cycle. In this paper, we propose a new strategy to analyze the cell cycle regulation of gene expression profiles based on the combination of singular spectrum analysis (SSA) and autoregressive (AR) spectral estimation. Using the SSA, we extract the dominant trend of data and reduce the effect of noise. Based on the AR analysis, high resolution spectra can be produced. Experiment results show that our method can extract more genes and the information can be useful for new drug design.
international symposium on neural networks | 2005
Shuanhu Wu; Alan Wee-Chung Liew; Hong Yan
In this paper, a new feature extracting method and clustering scheme in spectral space for gene expression data was proposed. We model each member of same cluster as the sum of clusters representative term and experimental artifacts term. More compact clusters and hence better clustering results can be obtained through extracting essential features or reducing experimental artifacts. In term of the periodicity of gene expression profile data, features extracting is performed in DCT domain by soft-thresholding de-noising method. Clustering process is based on OPTOC competitive learning strategy. The results for clustering real gene expression profiles show that our method is better than directly clustering in the original space.
international symposium on neural networks | 2008
Shuanhu Wu; Qingshang Zeng; Yinbin Song; Lihong Wang; Yanjie Zhang
Computational prediction of eukaryotic promoter is one of most elusive problems in DNA sequence analysis. Although considerable efforts have been devoted to this study and a number of algorithms have been developed in the last few years, their performances still need to further improve. In this work, we developed a new algorithm called PPFB for promoter prediction base on following hypothesis: promoter is determined by some motifs or word patterns and different promoters are determined by different motifs. We select most potential motifs (i.e. features) by divergence distance between two classes and constructed a classifier by feature boosting. Different from other classifier, we adopted a different training and classifying strategy. Computational results on large genomic sequences and comparisons with the several excellent algorithms showed that our method is efficient with better sensitivity and specificity.