Simon Lin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Simon Lin is active.

Explore More

Publication

Featured researches published by Simon Lin.

Bioinformatics | 2008

lumi: a pipeline for processing Illumina microarray

Pan Du; Warren A. Kibbe; Simon Lin

UNLABELLED Illumina microarray is becoming a popular microarray platform. The BeadArray technology from Illumina makes its preprocessing and quality control different from other microarray technologies. Unfortunately, most other analyses have not taken advantage of the unique properties of the BeadArray system, and have just incorporated preprocessing methods originally designed for Affymetrix microarrays. lumi is a Bioconductor package especially designed to process the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene annotation portions. In specific, the lumi package includes a variance-stabilizing transformation (VST) algorithm that takes advantage of the technical replicates available on every Illumina microarray. Different normalization method options and multiple quality control plots are provided in the package. To better annotate the Illumina data, a vendor independent nucleotide universal identifier (nuID) was devised to identify the probes of Illumina microarray. The nuID annotation packages and output of lumi processed results can be easily integrated with other Bioconductor packages to construct a statistical data analysis pipeline for Illumina data. AVAILABILITY The lumi Bioconductor package, www.bioconductor.org

BMC Bioinformatics | 2010

Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis

Pan Du; Xiao Zhang; Chiang Ching Huang; Nadereh Jafari; Warren A. Kibbe; Lifang Hou; Simon Lin

BackgroundHigh-throughput profiling of DNA methylation status of CpG islands is crucial to understand the epigenetic regulation of genes. The microarray-based Infinium methylation assay by Illumina is one platform for low-cost high-throughput methylation profiling. Both Beta-value and M-value statistics have been used as metrics to measure methylation levels. However, there are no detailed studies of their relations and their strengths and limitations.ResultsWe demonstrate that the relationship between the Beta-value and M-value methods is a Logit transformation, and show that the Beta-value method has severe heteroscedasticity for highly methylated or unmethylated CpG sites. In order to evaluate the performance of the Beta-value and M-value methods for identifying differentially methylated CpG sites, we designed a methylation titration experiment. The evaluation results show that the M-value method provides much better performance in terms of Detection Rate (DR) and True Positive Rate (TPR) for both highly methylated and unmethylated CpG sites. Imposing a minimum threshold of difference can improve the performance of the M-value method but not the Beta-value method. We also provide guidance for how to select the threshold of methylation differences.ConclusionsThe Beta-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels. Therefore, we recommend using the M-value method for conducting differential methylation analysis and including the Beta-value statistics when reporting the results to investigators.

Nucleic Acids Research | 2008

Model-based variance-stabilizing transformation for Illumina microarray data

Simon Lin; Pan Du; Wolfgang Huber; Warren A. Kibbe

Variance stabilization is a step in the preprocessing of microarray data that can greatly benefit the performance of subsequent statistical modeling and inference. Due to the often limited number of technical replicates for Affymetrix and cDNA arrays, achieving variance stabilization can be difficult. Although the Illumina microarray platform provides a larger number of technical replicates on each array (usually over 30 randomly distributed beads per probe), these replicates have not been leveraged in the current log2 data transformation process. We devised a variance-stabilizing transformation (VST) method that takes advantage of the technical replicates available on an Illumina microarray. We have compared VST with log2 and Variance-stabilizing normalization (VSN) by using the Kruglyak bead-level data (2006) and Barnes titration data (2005). The results of the Kruglyak data suggest that VST stabilizes variances of bead-replicates within an array. The results of the Barnes data show that VST can improve the detection of differentially expressed genes and reduce false-positive identifications. We conclude that although both VST and VSN are built upon the same model of measurement noise, VST stabilizes the variance better and more efficiently for the Illumina platform by leveraging the availability of a larger number of within-array replicates. The algorithms and Supplementary Data are included in the lumi package of Bioconductor, available at: www.bioconductor.org.

BMC Bioinformatics | 2010

ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data

Lihua Julie Zhu; Claude Gazin; Nathan D. Lawson; Hervé Pagès; Simon Lin; David S. Lapointe; Michael R. Green

BackgroundChromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome.ResultsWe have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes.ConclusionsChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenom e, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database.

BMC Genomics | 2009

Annotating the human genome with Disease Ontology.

John D. Osborne; Jared Flatow; Michelle Holko; Simon Lin; Warren A. Kibbe; Lihua Julie Zhu; Maria I. Danila; Gang Feng; Rex L. Chisholm

BackgroundThe human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.ResultsWe used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations.ConclusionThe validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.

Clinical Pharmacology & Therapeutics | 2014

Design and anticipated outcomes of the eMERGE-PGx project: a multicenter pilot for preemptive pharmacogenomics in electronic health record systems.

Laura J. Rasmussen-Torvik; Sarah Stallings; Adam S. Gordon; Berta Almoguera; Melissa A. Basford; Suzette J. Bielinski; Ariel Brautbar; Murray H. Brilliant; David Carrell; John J. Connolly; David R. Crosslin; Kimberly F. Doheny; Carlos J. Gallego; Omri Gottesman; Daniel Seung Kim; Kathleen A. Leppig; Rongling Li; Simon Lin; Shannon Manzi; Ana R. Mejia; Jennifer A. Pacheco; Vivian Pan; Jyotishman Pathak; Cassandra Perry; Josh F. Peterson; Cynthia A. Prows; James D. Ralston; Luke V. Rasmussen; Marylyn D. Ritchie; Senthilkumar Sadhasivam

We describe here the design and initial implementation of the eMERGE‐PGx project. eMERGE‐PGx, a partnership of the Electronic Medical Records and Genomics Network and the Pharmacogenomics Research Network, has three objectives: (i) to deploy PGRNseq, a next‐generation sequencing platform assessing sequence variation in 84 proposed pharmacogenes, in nearly 9,000 patients likely to be prescribed drugs of interest in a 1‐ to 3‐year time frame across several clinical sites; (ii) to integrate well‐established clinically validated pharmacogenetic genotypes into the electronic health record with associated clinical decision support and to assess process and clinical outcomes of implementation; and (iii) to develop a repository of pharmacogenetic variants of unknown significance linked to a repository of electronic health record–based clinical phenotype data for ongoing pharmacogenomics discovery. We describe site‐specific project implementation and anticipated products, including genetic variant and phenotype data repositories, novel variant association studies, clinical decision support modules, clinical and process outcomes, approaches to managing incidental findings, and patient and clinician education methods.

Bioinformatics | 2009

From disease ontology to disease-ontology lite

Pan Du; Gang Feng; Jared Flatow; Jie Song; Michelle Holko; Warren A. Kibbe; Simon Lin

Subjective methods have been reported to adapt a general-purpose ontology for a specific application. For example, Gene Ontology (GO) Slim was created from GO to generate a highly aggregated report of the human-genome annotation. We propose statistical methods to adapt the general purpose, OBO Foundry Disease Ontology (DO) for the identification of gene-disease associations. Thus, we need a simplified definition of disease categories derived from implicated genes. On the basis of the assumption that the DO terms having similar associated genes are closely related, we group the DO terms based on the similarity of gene-to-DO mapping profiles. Two types of binary distance metrics are defined to measure the overall and subset similarity between DO terms. A compactness-scalable fuzzy clustering method is then applied to group similar DO terms. To reduce false clustering, the semantic similarities between DO terms are also used to constrain clustering results. As such, the DO terms are aggregated and the redundant DO terms are largely removed. Using these methods, we constructed a simplified vocabulary list from the DO called Disease Ontology Lite (DOLite). We demonstrated that DOLite results in more interpretable results than DO for gene-disease association tests. The resultant DOLite has been used in the Functional Disease Ontology (FunDO) Web application at http://www.projects.bioinformatics.northwestern.edu/fundo. Contact: [email protected]

Stroke | 2009

Genomics of Human Intracranial Aneurysm Wall

Changbin Shi; Issam A. Awad; Nadereh Jafari; Simon Lin; Pan Du; Ziad A. Hage; Robert Shenkar; Christopher C. Getch; Markus Bredel; H. Hunt Batjer; Bernard R. Bendok

Background and Purpose— The pathogenesis of intracranial aneurysms (IAs) remains elusive. Most studies have focused on individual genes, or a few interrelated genes or products, at a time in human IA. However, a broad view of pathologic mechanisms has not been investigated by identifying pathogenic genes and their interaction in networks. Our study aimed to analyze global gene expression patterns in the IA wall. Methods— To our knowledge, our group was the first to perform Illumina microarray analysis on human IA via comparison of aneurysm wall and superficial temporal artery tissues from 6 consecutive patients. We adopted stringent statistical criteria to the individual genes; genes with a false discovery rate <0.01 and >2-fold change were selected as differentially expressed. To identify the overrepresented biologic pathways with the differentially expressed genes, we performed hypergeometric testing of the genes selected by relaxed criteria of P<0.01 and fold change >1.5. Results— There are 326 distinct differentially expressed genes between IA and superficial temporal artery tissues (>2-fold change) with a false discovery rate <0.01. Analysis of the Kyoto Encyclopedia of Genes and Genomes pathways revealed the most impacted functional pathways: focal adhesion, extracellular matrix receptor interaction, and cell communication. Analysis of the Gene Ontology also supported the involvement of another 2 potentially important pathways: inflammatory response and apoptosis. Conclusions— The differentially expressed genes in the aneurysm wall may shed light on aneurysm pathobiology and provide novel targets for therapeutic intervention. These data will help generate hypotheses for future studies.

Genome Biology | 2015

Comparison of RNA-seq and microarray-based models for clinical endpoint prediction

Wenqian Zhang; Falk Hertwig; Jean Thierry-Mieg; Wenwei Zhang; Danielle Thierry-Mieg; Jian Wang; Cesare Furlanello; Viswanath Devanarayan; Jie Cheng; Youping Deng; Barbara Hero; Huixiao Hong; Meiwen Jia; Li Li; Simon Lin; Yuri Nikolsky; André Oberthuer; Tao Qing; Zhenqiang Su; Ruth Volland; Charles Wang; May D. Wang; Junmei Ai; Davide Albanese; Shahab Asgharzadeh; Smadar Avigad; Wenjun Bao; Marina Bessarabova; Murray H. Brilliant; Benedikt Brors

BackgroundGene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model.ResultsWe generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models.ConclusionsWe demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

Molecular and Cellular Biology | 2009

GATA-2 reinforces megakaryocyte development in the absence of GATA-1.

Zan Huang; Louis C. Doré; Zhe Li; Stuart H. Orkin; Gang Feng; Simon Lin; John D. Crispino

ABSTRACT GATA-2 is an essential transcription factor that regulates multiple aspects of hematopoiesis. Dysregulation of GATA-2 is a hallmark of acute megakaryoblastic leukemia in children with Down syndrome, a malignancy that is defined by the combination of trisomy 21 and a GATA1 mutation. Here, we show that GATA-2 is required for normal megakaryocyte development as well as aberrant megakaryopoiesis in Gata1 mutant cells. Furthermore, we demonstrate that GATA-2 indirectly controls cell cycle progression in GATA-1-deficient megakaryocytes. Genome-wide microarray analysis and chromatin immunoprecipitation studies revealed that GATA-2 regulates a wide set of genes, including cell cycle regulators and megakaryocyte-specific genes. Surprisingly, GATA-2 also negatively regulates the expression of crucial myeloid transcription factors, such as Sfpi1 and Cebpa. In the absence of GATA-1, GATA-2 prevents induction of a latent myeloid gene expression program. Thus, GATA-2 contributes to cell cycle progression and the maintenance of megakaryocyte identity of GATA-1-deficient cells, including GATA-1s-expressing fetal megakaryocyte progenitors. Moreover, our data reveal that overexpression of GATA-2 facilitates aberrant megakaryopoiesis.

Explore More