Youlian Pan
National Research Council
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Youlian Pan.
BMC Genomics | 2009
Yi Huang; Liang Chen; Liping Wang; Kannan Vijayan; Sieu Phan; Ziying Liu; Lianglu Wan; Andrew R. S. Ross; Daoquan Xiang; Raju Datla; Youlian Pan; Jitao Zou
BackgroundIn species with exalbuminous seeds, the endosperm is eventually consumed and its space occupied by the embryo during seed development. However, the main constituent of the early developing seed is the liquid endosperm, and a significant portion of the carbon resources for the ensuing stages of seed development arrive at the embryo through the endosperm. In contrast to the extensive study of species with persistent endosperm, little is known about the global gene expression pattern in the endosperm of exalbuminous seed species such as crucifer oilseeds.ResultsWe took a multiparallel approach that combines ESTs, protein profiling and microarray analyses to look into the gene expression landscape in the endosperm of the oilseed crop Brassica napus. An EST collection of over 30,000 entries allowed us to detect close to 10,000 unisequences expressed in the endosperm. A protein profile analysis of more than 800 proteins corroborated several signature pathways uncovered by abundant ESTs. Using microarray analyses, we identified genes that are differentially or highly expressed across all developmental stages. These complementary analyses provided insight on several prominent metabolic pathways in the endosperm. We also discovered that a transcription factor LEAFY COTYLEDON (LEC1) was highly expressed in the endosperm and that the regulatory cascade downstream of LEC1 operates in the endosperm.ConclusionThe endosperm EST collection and the microarray dataset provide a basic genomic resource for dissecting metabolic and developmental events important for oilseed improvement. Our findings on the featured metabolic processes and the LEC1 regulatory cascade offer new angles for investigation on the integration of endosperm gene expression with embryo development and storage product deposition in seed development.
Journal of Bioinformatics and Computational Biology | 2004
Youlian Pan; Jeffrey D. Pylatuik; Junjun Ouyang; A. Fazel Famili; Pierre R. Fobert
Various data mining techniques combined with sequence motif information in the promoter region of genes were applied to discover functional genes that are involved in the defense mechanism of systemic acquired resistance (SAR) in Arabidopsis thaliana. A series of K-Means clustering with difference-in-shape as distance measure was initially applied. A stability measure was used to validate this clustering process. A decision tree algorithm with the discover-and-mask technique was used to identify a group of most informative genes. Appearance and abundance of various transcription factor binding sites in the promoter region of the genes were studied. Through the combination of these techniques, we were able to identify 24 candidate genes involved in the SAR defense mechanism. The candidate genes fell into 2 highly resolved categories, each category showing significantly unique profiles of regulatory elements in their promoter regions. This study demonstrates the strength of such integration methods and suggests a broader application of this approach.
BMC Bioinformatics | 2012
Alain B. Tchagang; Sieu Phan; Fazel Famili; Heather Shearer; Pierre R. Fobert; Yi Huang; Jitao Zou; Daiqing Huang; Adrian J. Cutler; Ziying Liu; Youlian Pan
BackgroundNowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space.ResultsWe developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples.ConclusionsOur analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.
BMC Bioinformatics | 2010
Alain B. Tchagang; Alexander Gawronski; Hugo Bérubé; Sieu Phan; Fazel Famili; Youlian Pan
BackgroundModern high throughput experimental techniques such as DNA microarrays often result in large lists of genes. Computational biology tools such as clustering are then used to group together genes based on their similarity in expression profiles. Genes in each group are probably functionally related. The functional relevance among the genes in each group is usually characterized by utilizing available biological knowledge in public databases such as Gene Ontology (GO), KEGG pathways, association between a transcription factor (TF) and its target genes, and/or gene networks.ResultsWe developed GOAL: G ene O ntology A naL yzer, a software tool specifically designed for the functional evaluation of gene groups. GOAL implements and supports efficient and statistically rigorous functional interpretations of gene groups through its integration with available GO, TF-gene association data, and association with KEGG pathways. In order to facilitate more specific functional characterization of a gene group, we implement three GO-tree search strategies rather than one as in most existing GO analysis tools. Furthermore, GOAL offers flexibility in deployment. It can be used as a standalone tool, a plug-in to other computational biology tools, or a web server application.ConclusionWe developed a functional evaluation software tool, GOAL, to perform functional characterization of a gene group. GOAL offers three GO-tree search strategies and combines its strength in function integration, portability and visualization, and its flexibility in deployment. Furthermore, GOAL can be used to evaluate and compare gene groups as the output from computational biology tools such as clustering algorithms.
International Journal of Computer Mathematics | 2007
Sieu Phan; Fazel Famili; Zuojian Tang; Youlian Pan; Ziying Liu; Junjun Ouyang; Anne E.G. Lenferink; Maureen D. O'connor
Identification of co-expressed genes sharing similar biological behaviours is an essential step in functional genomics. Traditional clustering techniques are generally based on overall similarity of expression levels and often generate clusters with mixed profile patterns. A novel pattern recognition method for selecting co-expressed genes based on rate of change and modulation status of gene expression at each time interval is proposed in this paper. This method is capable of identifying gene clusters consisting of highly similar shapes of expression profiles and modulation patterns. Furthermore, we develop a quality index based on the semantic similarity in gene annotations to assess the likelihood of a cluster being a co-regulated group. The effectiveness of the proposed methodology is demonstrated by applying it to the well-known yeast sporulation dataset and an in-house cancer genomics dataset.
Journal of Bioinformatics and Computational Biology | 2010
Liu Z; Phan S; Famili F; Youlian Pan; Lenferink Ae; Cantin C; Collins C; O'Connor-McCourt
An unsupervised multi-strategy approach has been developed to identify informative genes from high throughput genomic data. Several statistical methods have been used in the field to identify differentially expressed genes. Since different methods generate different lists of genes, it is very challenging to determine the most reliable gene list and the appropriate method. This paper presents a multi-strategy method, in which a combination of several data analysis techniques are applied to a given dataset and a confidence measure is established to select genes from the gene lists generated by these techniques to form the core of our final selection. The remainder of the genes that form the peripheral region are subject to exclusion or inclusion into the final selection. This paper demonstrates this methodology through its application to an in-house cancer genomics dataset and a public dataset. The results indicate that our method provides more reliable list of genes, which are validated using biological knowledge, biological experiments, and literature search. We further evaluated our multi-strategy method by consolidating two pairs of independent datasets, each pair is for the same disease, but generated by different labs using different platforms. The results showed that our method has produced far better results.
Oncotarget | 2016
François Fauteux; Jennifer J. Hill; Maria L. Jaramillo; Youlian Pan; Sieu Phan; Fazel Famili; Maureen O’Connor-McCourt
The selection of therapeutic targets is a critical aspect of antibody-drug conjugate research and development. In this study, we applied computational methods to select candidate targets overexpressed in three major breast cancer subtypes as compared with a range of vital organs and tissues. Microarray data corresponding to over 8,000 tissue samples were collected from the public domain. Breast cancer samples were classified into molecular subtypes using an iterative ensemble approach combining six classification algorithms and three feature selection techniques, including a novel kernel density-based method. This feature selection method was used in conjunction with differential expression and subcellular localization information to assemble a primary list of targets. A total of 50 cell membrane targets were identified, including one target for which an antibody-drug conjugate is in clinical use, and six targets for which antibody-drug conjugates are in clinical trials for the treatment of breast cancer and other solid tumors. In addition, 50 extracellular proteins were identified as potential targets for non-internalizing strategies and alternative modalities. Candidate targets linked with the epithelial-to-mesenchymal transition were identified by analyzing differential gene expression in epithelial and mesenchymal tumor-derived cell lines. Overall, these results show that mining human gene expression data has the power to select and prioritize breast cancer antibody-drug conjugate targets, and the potential to lead to new and more effective cancer therapeutics.
BMC Genomics | 2015
Dan Tulpan; Serge Léger; Alain B. Tchagang; Youlian Pan
BackgroundWhile the gargantuan multi-nation effort of sequencing T. aestivum gets close to completion, the annotation process for the vast number of wheat genes and proteins is in its infancy. Previous experimental studies carried out on model plant organisms such as A. thaliana and O. sativa provide a plethora of gene annotations that can be used as potential starting points for wheat gene annotations, proven that solid cross-species gene-to-gene and protein-to-protein correspondences are provided.ResultsDNA and protein sequences and corresponding annotations for T. aestivum and 9 other plant species were collected from Ensembl Plants release 22 and curated. Cliques of predicted 1-to-1 orthologs were identified and an annotation enrichment model was defined based on existing gene-GO term associations and phylogenetic relationships among wheat and 9 other plant species. A total of 13 cliques of size 10 were identified, which represent putative functionally equivalent genes and proteins in the 10 plant species. Eighty-five new and more specific GO terms were associated with wheat genes in the 13 cliques of size 10, which represent a 65% increase compared with the previously 130 known GO terms. Similar expression patterns for 4 genes from Arabidopsis, barley, maize and rice in cliques of size 10 provide experimental evidence to support our model. Overall, based on clique size equal or larger than 3, our model enriched the existing gene-GO term associations for 7,838 (8%) wheat genes, of which 2,139 had no previous annotation.ConclusionsOur novel comparative genomics approach enriches existing T. aestivum gene annotations based on cliques of predicted 1-to-1 orthologs, phylogenetic relationships and existing gene ontologies from 9 other plant species.
Handbook of Research on Computational and Systems Biology | 2011
Alain B. Tchagang; Youlian Pan; Fazel Famili; Ahmed H. Tewfik; Panayiotis V. Benos
In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Various computational and evaluation methods based on diverse principles were introduced to identify new similarities among genes. Mathematical aspects of the models are highlighted, and applications to solve biological problems are discussed. Panayiotis V. Benos University of Pittsburgh, USA
computational intelligence in bioinformatics and computational biology | 2007
Zuojian Tang; Sieu Phan; Youlian Pan; A.F. Famili
Gene ontology (GO) is organized in three principles, cellular component, biological process and molecular function. analysis of GO annotations of a list of differentially expressed genes on microarrays became a common approach in helping with their biological interpretation. Earlier studies in GO analysis are based on a single principle, mostly Biological Process; valuable information in the other two principles is neglected. This paper proposes a novel approach to investigate gene co-regulation based on GO annotations from all three principles. We used the semantic similarity of GO annotations as a measure to partition genes into functionally related clusters and developed a performance index (PI) that consolidates GO annotations from all three principles to measure the quality of each cluster. We successfully applied our algorithm to yeast dataset. Our results indicate that PI is a good measure of the likelihood of a cluster being co-regulated by one or more TFs. Another analysis based on individual GO principle indicates that gene annotations in biological process are the most informative and those in cellular component are the least informative with regard of gene co-regulation. However, none of the analyses based on an individual principle could provide satisfactory classification. It is important to consider gene annotations in all three principles