Simon Cawley
Affymetrix
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Simon Cawley.
Nature Genetics | 2008
Steven A. McCarroll; Finny Kuruvilla; Joshua M. Korn; Simon Cawley; James Nemesh; Alec Wysoker; Michael H. Shapero; Paul I. W. de Bakker; Julian Maller; Andrew Kirby; Amanda L. Elliott; Melissa Parkin; Earl Hubbell; Teresa Webster; Rui Mei; James Veitch; Patrick J Collins; Robert E. Handsaker; Steve Lincoln; Marcia M. Nizzari; John E. Blume; Keith W. Jones; Rich Rava; Mark J. Daly; Stacey Gabriel; David Altshuler
Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.
Nature Genetics | 2008
Joshua M. Korn; Finny Kuruvilla; Steven A. McCarroll; Alec Wysoker; James Nemesh; Simon Cawley; Earl Hubbell; Jim Veitch; Patrick J Collins; Katayoon Darvishi; Charles Lee; Marcia M. Nizzari; Stacey Gabriel; S Purcell; Mark J. Daly; David Altshuler
Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.
Nature Methods | 2004
Hajime Matsuzaki; Shoulian Dong; Halina Loi; Xiaojun Di; Guoying Liu; Earl Hubbell; Jane Law; Tam Berntsen; Monica Chadha; Henry Hui; Geoffrey Yang; Giulia C. Kennedy; Teresa Webster; Simon Cawley; P. Sean Walsh; Keith W. Jones; Stephen P. A. Fodor; Rui Mei
We present a genotyping method for simultaneously scoring 116,204 SNPs using oligonucleotide arrays. At call rates >99%, reproducibility is >99.97% and accuracy, as measured by inheritance in trios and concordance with the HapMap Project, is >99.7%. Average intermarker distance is 23.6 kb, and 92% of the genome is within 100 kb of a SNP marker. Average heterozygosity is 0.30, with 105,511 SNPs having minor allele frequencies >5%.
Genomics | 2011
Thomas J. Hoffmann; Mark N. Kvale; Stephanie Hesselson; Yiping Zhan; Christine Aquino; Yang Cao; Simon Cawley; Elaine Chung; Sheryl Connell; Jasmin Eshragh; Marcia Ewing; Jeremy Gollub; Mary Henderson; Earl Hubbell; Carlos Iribarren; Jay Kaufman; Richard Lao; Yontao Lu; Dana Ludwig; Gurpreet K. Mathauda; William B. McGuire; Gangwu Mei; Sunita Miles; Matthew M. Purdy; Charles P. Quesenberry; Dilrini Ranatunga; Sarah Rowell; Marianne Sadler; Michael H. Shapero; Ling Shen
The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.
BMC Genetics | 2005
Howard J. Edenberg; Laura J. Bierut; Paul Boyce; Manqiu Cao; Simon Cawley; Richard Chiles; Kimberly F. Doheny; Mark Hansen; Tony Hinrichs; Kevin A. Jones; Mark Kelleher; Giulia C. Kennedy; Guoying Liu; Gregory Marcus; Celeste McBride; Sarah S. Murray; Arnold Oliphant; James Pettengill; Bernice Porjesz; Elizabeth W. Pugh; John P. Rice; Stu Shannon; Rhoberta Steeke; Jay A. Tischfield; Ya Yu Tsai; Chun Zhang; Henri Begleiter
The data provided to the Genetic Analysis Workshop 14 (GAW 14) was the result of a collaboration among several different groups, catalyzed by Elizabeth Pugh from The Center for Inherited Disease Research (CIDR) and the organizers of GAW 14, Jean MacCluer and Laura Almasy. The DNA, phenotypic characterization, and microsatellite genomic survey were provided by the Collaborative Study on the Genetics of Alcoholism (COGA), a nine-site national collaboration funded by the National Institute of Alcohol and Alcoholism (NIAAA) and the National Institute of Drug Abuse (NIDA) with the overarching goal of identifying and characterizing genes that affect the susceptibility to develop alcohol dependence and related phenotypes. CIDR, Affymetrix, and Illumina provided single-nucleotide polymorphism genotyping of a large subset of the COGA subjects. This article briefly describes the dataset that was provided.
Journal of Computational Biology | 2002
Lior Pachter; Marina Alexandersson; Simon Cawley
Hidden Markov models (HMMs) have been successfully applied to a variety of problems in molecular biology, ranging from alignment problems to gene finding and annotation. Alignment problems can be solved with pair HMMs, while gene finding programs rely on generalized HMMs in order to model exon lengths. In this paper, we introduce the generalized pair HMM (GPHMM), which is an extension of both pair and generalized HMMs. We show how GPHMMs, in conjunction with approximate alignments, can be used for cross-species gene finding and describe applications to DNA-cDNA and DNA-protein alignment. GPHMMs provide a unifying and probabilistically sound theory for modeling these problems.
intelligent systems in molecular biology | 2005
Melissa S. Cline; John E. Blume; Simon Cawley; Tyson A. Clark; Jing-Shan Hu; Gang Lu; Nathan Salomonis; Hui Wang; Alan Williams
MOTIVATION Many or most mammalian genes undergo alternative splicing, generating a variety of transcripts from a single gene. New information on splice variation is becoming available through technology for measuring expression levels of several exons or splice junctions per gene. We have developed a statistical method, ANalysis Of Splice VAriation (ANOSVA) to detect alternative splicing from expression data. Since ANOSVA requires no transcript information, it can be applied when the level of annotation is poor. When validated against spiked clone data, it generated no false positives and few false negatives. We demonstrated ANOSVA with data from a prototype mouse alternative splicing array, run against normal adult tissues, yielding a set of genes with evidence of tissue-specific splice variation. AVAILABILITY The results are available at the supplementary information site. SUPPLEMENTARY INFORMATION The results are available at the supplementary information site https://bioinfo.affymetrix.com/Papers/ANOSVA/
Molecular and Biochemical Parasitology | 2001
Simon Cawley; Anthony Wirth; Terence P. Speed
We describe and assess the performance of the gene finding program pretty handy annotation tool (Phat) on sequence from the malaria parasite Plasmodium falciparum. Phat is based on a generalized hidden Markov model (GHMM) similar to the models used in GENSCAN, Genie and HMMgene. In a test set of 44 confirmed gene structures Phat achieves nucleotide-level sensitivity and specificity of greater than 95%, performing as well as the other P. falciparum gene finding programs Hexamer and GlimmerM. Phat is particularly useful for P. falciparum and other eukaryotes for which there are few gene finding programs available as it is distributed with code for retraining it on new organisms. Moreover, the full source code is freely available under the GNU General Public License, allowing for users to further develop and customize it.
Molecular and Biochemical Parasitology | 2001
Jane M. Carlton; Ralhston Muller; Charles A. Yowell; Michelle R. Fluegge; Kenneth A. Sturrock; Jonathan R. Pritt; Esmeralda Vargas-Serrato; Mary R. Galinski; John W. Barnwell; Nicola Mulder; Alexander Kanapin; Simon Cawley; Winston Hide; John B. Dame
We have undertaken the first comparative pilot gene discovery analysis of approximately 25,000 random genomic and expressed sequence tags (ESTs) from three species of Plasmodium, the infectious agent that causes malaria. A total of 5482 genome survey sequences (GSSs) and 5582 ESTs were generated from mung bean nuclease (MBN) and cDNA libraries, respectively, of the ANKA line of the rodent malaria parasite Plasmodium berghei, and 10,874 GSSs generated from MBN libraries of the Salvador I and Belem lines of Plasmodium vivax, the most geographically wide-spread human malaria pathogen. These tags, together with 2438 Plasmodium falciparum sequences present in GenBank, were used to perform first-pass assembly and transcript reconstruction, and non-redundant consensus sequence datasets created. The datasets were compared against public protein databases and more than 1000 putative new Plasmodium proteins identified based on sequence similarity. Homologs of previously characterized Plasmodium genes were also identified, increasing the number of P. vivax and P. berghei sequences in public databases at least 10-fold. Comparative studies with other species of Apicomplexa identified interesting homologs of possible therapeutic or diagnostic value. A gene prediction program, Phat, was used to predict probable open reading frames for proteins in all three datasets. Predicted and non-redundant BLAST-matched proteins were submitted to InterPro, an integrated database of protein domains, signatures and families, for functional classification. Thus a partial predicted proteome was created for each species. This first comparative analysis of Plasmodium protein coding sequences represents a valuable resource for further studies on the biology of this important pathogen.
Bioinformatics | 2007
K. Hao; Xiaojun Di; Simon Cawley
UNLABELLED The scale of genetic-variation datasets has increased enormously and the linkage equilibrium (LD) structure of these polymorphisms, particularly in whole-genome association studies, is of great interest. The significant computational complexity of calculating single- and multiple-marker correlations at a genome-wide scale remains challenging. We have developed a program that efficiently characterizes whole-genome LD structure on large number of SNPs in terms of single- and multiple-marker correlations. AVAILABILITY LdCompare is licensed under the GNU General Public License (GPL). Source code, documentation, testing datasets and precompiled executables are available for download at: http://www.affymetrix.com/support/developer/tools/devnettools.affx