Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sieu Phan is active.

Publication


Featured researches published by Sieu Phan.


BMC Genomics | 2009

Probing the endosperm gene expression landscape in Brassica napus.

Yi Huang; Liang Chen; Liping Wang; Kannan Vijayan; Sieu Phan; Ziying Liu; Lianglu Wan; Andrew R. S. Ross; Daoquan Xiang; Raju Datla; Youlian Pan; Jitao Zou

BackgroundIn species with exalbuminous seeds, the endosperm is eventually consumed and its space occupied by the embryo during seed development. However, the main constituent of the early developing seed is the liquid endosperm, and a significant portion of the carbon resources for the ensuing stages of seed development arrive at the embryo through the endosperm. In contrast to the extensive study of species with persistent endosperm, little is known about the global gene expression pattern in the endosperm of exalbuminous seed species such as crucifer oilseeds.ResultsWe took a multiparallel approach that combines ESTs, protein profiling and microarray analyses to look into the gene expression landscape in the endosperm of the oilseed crop Brassica napus. An EST collection of over 30,000 entries allowed us to detect close to 10,000 unisequences expressed in the endosperm. A protein profile analysis of more than 800 proteins corroborated several signature pathways uncovered by abundant ESTs. Using microarray analyses, we identified genes that are differentially or highly expressed across all developmental stages. These complementary analyses provided insight on several prominent metabolic pathways in the endosperm. We also discovered that a transcription factor LEAFY COTYLEDON (LEC1) was highly expressed in the endosperm and that the regulatory cascade downstream of LEC1 operates in the endosperm.ConclusionThe endosperm EST collection and the microarray dataset provide a basic genomic resource for dissecting metabolic and developmental events important for oilseed improvement. Our findings on the featured metabolic processes and the LEC1 regulatory cascade offer new angles for investigation on the integration of endosperm gene expression with embryo development and storage product deposition in seed development.


BMC Bioinformatics | 2012

Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm

Alain B. Tchagang; Sieu Phan; Fazel Famili; Heather Shearer; Pierre R. Fobert; Yi Huang; Jitao Zou; Daiqing Huang; Adrian J. Cutler; Ziying Liu; Youlian Pan

BackgroundNowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.ResultsWe developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.ConclusionsOur analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.


BMC Bioinformatics | 2010

GOAL: A software tool for assessing biological significance of genes groups

Alain B. Tchagang; Alexander Gawronski; Hugo Bérubé; Sieu Phan; Fazel Famili; Youlian Pan

BackgroundModern high throughput experimental techniques such as DNA microarrays often result in large lists of genes. Computational biology tools such as clustering are then used to group together genes based on their similarity in expression profiles. Genes in each group are probably functionally related. The functional relevance among the genes in each group is usually characterized by utilizing available biological knowledge in public databases such as Gene Ontology (GO), KEGG pathways, association between a transcription factor (TF) and its target genes, and/or gene networks.ResultsWe developed GOAL: G ene O ntology A naL yzer, a software tool specifically designed for the functional evaluation of gene groups. GOAL implements and supports efficient and statistically rigorous functional interpretations of gene groups through its integration with available GO, TF-gene association data, and association with KEGG pathways. In order to facilitate more specific functional characterization of a gene group, we implement three GO-tree search strategies rather than one as in most existing GO analysis tools. Furthermore, GOAL offers flexibility in deployment. It can be used as a standalone tool, a plug-in to other computational biology tools, or a web server application.ConclusionWe developed a functional evaluation software tool, GOAL, to perform functional characterization of a gene group. GOAL offers three GO-tree search strategies and combines its strength in function integration, portability and visualization, and its flexibility in deployment. Furthermore, GOAL can be used to evaluate and compare gene groups as the output from computational biology tools such as clustering algorithms.


International Journal of Computer Mathematics | 2007

A novel pattern based clustering methodology for time-series microarray data

Sieu Phan; Fazel Famili; Zuojian Tang; Youlian Pan; Ziying Liu; Junjun Ouyang; Anne E.G. Lenferink; Maureen D. O'connor

Identification of co-expressed genes sharing similar biological behaviours is an essential step in functional genomics. Traditional clustering techniques are generally based on overall similarity of expression levels and often generate clusters with mixed profile patterns. A novel pattern recognition method for selecting co-expressed genes based on rate of change and modulation status of gene expression at each time interval is proposed in this paper. This method is capable of identifying gene clusters consisting of highly similar shapes of expression profiles and modulation patterns. Furthermore, we develop a quality index based on the semantic similarity in gene annotations to assess the likelihood of a cluster being a co-regulated group. The effectiveness of the proposed methodology is demonstrated by applying it to the well-known yeast sporulation dataset and an in-house cancer genomics dataset.


Oncotarget | 2016

Computational selection of antibody-drug conjugate targets for breast cancer

François Fauteux; Jennifer J. Hill; Maria L. Jaramillo; Youlian Pan; Sieu Phan; Fazel Famili; Maureen O’Connor-McCourt

The selection of therapeutic targets is a critical aspect of antibody-drug conjugate research and development. In this study, we applied computational methods to select candidate targets overexpressed in three major breast cancer subtypes as compared with a range of vital organs and tissues. Microarray data corresponding to over 8,000 tissue samples were collected from the public domain. Breast cancer samples were classified into molecular subtypes using an iterative ensemble approach combining six classification algorithms and three feature selection techniques, including a novel kernel density-based method. This feature selection method was used in conjunction with differential expression and subcellular localization information to assemble a primary list of targets. A total of 50 cell membrane targets were identified, including one target for which an antibody-drug conjugate is in clinical use, and six targets for which antibody-drug conjugates are in clinical trials for the treatment of breast cancer and other solid tumors. In addition, 50 extracellular proteins were identified as potential targets for non-internalizing strategies and alternative modalities. Candidate targets linked with the epithelial-to-mesenchymal transition were identified by analyzing differential gene expression in epithelial and mesenchymal tumor-derived cell lines. Overall, these results show that mining human gene expression data has the power to select and prioritize breast cancer antibody-drug conjugate targets, and the potential to lead to new and more effective cancer therapeutics.


computational intelligence in bioinformatics and computational biology | 2007

Prediction of Co-Regulated Gene Groups through Gene Ontology

Zuojian Tang; Sieu Phan; Youlian Pan; A.F. Famili

Gene ontology (GO) is organized in three principles, cellular component, biological process and molecular function. analysis of GO annotations of a list of differentially expressed genes on microarrays became a common approach in helping with their biological interpretation. Earlier studies in GO analysis are based on a single principle, mostly Biological Process; valuable information in the other two principles is neglected. This paper proposes a novel approach to investigate gene co-regulation based on GO annotations from all three principles. We used the semantic similarity of GO annotations as a measure to partition genes into functionally related clusters and developed a performance index (PI) that consolidates GO annotations from all three principles to measure the quality of each cluster. We successfully applied our algorithm to yeast dataset. Our results indicate that PI is a good measure of the likelihood of a cluster being co-regulated by one or more TFs. Another analysis based on individual GO principle indicates that gene annotations in biological process are the most informative and those in cellular component are the least informative with regard of gene co-regulation. However, none of the analyses based on an individual principle could provide satisfactory classification. It is important to consider gene annotations in all three principles


International Journal of Computational Biology and Drug Design | 2008

An ensemble machine learning approach to predict survival in breast cancer

Amira Djebbari; Ziying Liu; Sieu Phan; Fazel Famili

Current breast cancer predictive signatures are not unique. Can we use this fact to our advantage to improve prediction? From the machine learning perspective, it is well known that combining multiple classifiers can improve classification performance. We propose an ensemble machine learning approach which consists of choosing feature subsets and learning predictive models from them. We then combine models based on certain model fusion criteria and we also introduce a tuning parameter to control sensitivity. Our method significantly improves classification performance with a particular emphasis on sensitivity which is critical to avoid misclassifying poor prognosis patients as good prognosis.


computational intelligence in bioinformatics and computational biology | 2010

Towards a temporal modeling of the genetic network controlling Systemic Acquired Resistance in Arabidopsis thaliana

Alain B. Tchagang; Heather Shearer; Sieu Phan; Hugo Bérubé; Fazel Famili; Pierre R. Fobert; Youlian Pan

We studied defense mechanism of the Arabidopsis thaliana subjected to Salicylic Acid (SA) treatment for 0, 1, and 8 hours using a broader application of the frequent itemset approach. Four genotypes of the plant were used in this study, Columbia wild type, mutant npr1-3, double mutant tga1 tga4 and triple mutant tga2 tga5 tga6. We defined the major patterns of transcription regulation governing pathogen defense mechanism, thereby creating a model of the Systemic Acquired Resistance (SAR) at three time points. The temporal model describes the relationships among the regulators and defines groups of genes that are subject to similar regulation. The results obtained offered a first glimpse into the temporal pattern of the transcription regulatory network during SAR in Arabidopsis thaliana. We found that most of the genes that responded to SA challenge are in fact dependent on one or more of the NPR1 and TGA factors tested in this study.


computational intelligence in bioinformatics and computational biology | 2009

Goal Driven Analysis of cDNA Microarray Data

Youlian Pan; Jitao Zou; Yi Huang; Ziying Liu; Sieu Phan; Fazel Famili

Microarray technology has been used extensively for high throughput gene expression studies. Many bioinformatics tools are available for analysis of microarray data. In the data mining process, it is important to be goal oriented so that a set of proper tools can be assembled for the targeted knowledge discovery process. In this paper, we tackle this issue by using a microarray dataset from Brassica endosperm together with EST data to validate our process. We were most interested in which genes are highly expressed in Brassica endosperm and their variations and functions over various stages in embryo development. We also performed gene characterization based on gene ontology analysis. Our results indicate that designing a specific data mining workflow that considers both the log ratio and signal intensity enhances knowledge discovery process. Through this approach, we were able to find the regulatory relationship between two most important transcription factors, LEC1 and WRI1 in the endosperm of Brassica napus.


international conference industrial engineering other applications applied intelligent systems | 2010

Integrative data mining in functional genomics of brassica napus and arabidopsis thaliana

Youlian Pan; Alain B. Tchagang; Hugo Bérubé; Sieu Phan; Heather Shearer; Ziying Liu; Pierre R. Fobert; Fazel Famili

Vast amount of data in various forms have been accumulated through many years of functional genomic research throughout the world. It is a challenge to discover and disseminate knowledge hidden in these data. Many computational methods have been developed to solve this problem. Taking analysis of the microarray data as an example, we spent the past decade developing many data mining strategies and software tools. It appears still insufficient to cover all sources of data. In this paper, we summarize our experiences in mining microarray data by using two plant species, Brassica napus and Arabidopsis thaliana, as examples. We present several successful stories and also a few lessons learnt. The domain problems that we dealt with were the transcriptional regulation in seed development and during defense response against pathogen infection.

Collaboration


Dive into the Sieu Phan's collaboration.

Top Co-Authors

Avatar

Fazel Famili

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Youlian Pan

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Ziying Liu

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hugo Bérubé

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jitao Zou

Biotechnology Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge