Hajime Matsuzaki
Affymetrix
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hajime Matsuzaki.
Nature Biotechnology | 2003
Giulia C. Kennedy; Hajime Matsuzaki; Shoulian Dong; Wei-Min Liu; Jing Huang; Guoying Liu; Xing Su; Manqiu Cao; Wenwei Chen; Jane Zhang; Weiwei Liu; Geoffrey Yang; Xiaojun Di; Thomas B. Ryder; Zhijun He; Urvashi Surti; Michael S. Phillips; Michael T. Boyce-Jacino; Stephen P. A. Fodor; Keith W. Jones
Genetic studies aimed at understanding the molecular basis of complex human phenotypes require the genotyping of many thousands of single-nucleotide polymorphisms (SNPs) across large numbers of individuals. Public efforts have so far identified over two million common human SNPs; however, the scoring of these SNPs is labor-intensive and requires a substantial amount of automation. Here we describe a simple but effective approach, termed whole-genome sampling analysis (WGSA), for genotyping thousands of SNPs simultaneously in a complex DNA sample without locus-specific primers or automation. Our method amplifies highly reproducible fractions of the genome across multiple DNA samples and calls genotypes at >99% accuracy. We rapidly genotyped 14,548 SNPs in three different human populations and identified a subset of them with significant allele frequency differences between groups. We also determined the ancestral allele for 8,386 SNPs by genotyping chimpanzee and gorilla DNA. WGSA is highly scaleable and enables the creation of ultrahigh density SNP maps for use in genetic studies.
Nature Methods | 2004
Hajime Matsuzaki; Shoulian Dong; Halina Loi; Xiaojun Di; Guoying Liu; Earl Hubbell; Jane Law; Tam Berntsen; Monica Chadha; Henry Hui; Geoffrey Yang; Giulia C. Kennedy; Teresa Webster; Simon Cawley; P. Sean Walsh; Keith W. Jones; Stephen P. A. Fodor; Rui Mei
We present a genotyping method for simultaneously scoring 116,204 SNPs using oligonucleotide arrays. At call rates >99%, reproducibility is >99.97% and accuracy, as measured by inheritance in trios and concordance with the HapMap Project, is >99.7%. Average intermarker distance is 23.6 kb, and 92% of the genome is within 100 kb of a SNP marker. Average heterozygosity is 0.30, with 105,511 SNPs having minor allele frequencies >5%.
Human Genomics | 2005
Mark D. Shriver; Rui Mei; Esteban J. Parra; Vibhor Sonpar; Indrani Halder; Sarah A. Tishkoff; Theodore G. Schurr; Sergev I. Zhadanov; Ludmila P. Osipova; Tom D. Brutsaert; Jonathan S. Friedlaender; Lynn B. Jorde; W. Scott Watkins; Michael J. Bamshad; Gerardo Gutiérrez; Halina Loi; Hajime Matsuzaki; Rick A. Kittles; George Argyropoulos; Jose R. Fernandez; Joshua M. Akey; Keith W. Jones
Understanding the distribution of human genetic variation is an important foundation for research into the genetics of common diseases. Some of the alleles that modify common disease risk are themselves likely to be common and, thus, amenable to identification using gene-association methods. A problem with this approach is that the large sample sizes required for sufficient statistical power to detect alleles with moderate effect make gene-association studies susceptible to false-positive findings as the result of population stratification [1, 2]. Such type I errors can be eliminated by using either family-based association tests or methods that sufficiently adjust for population stratification [3–5]. These methods require the availability of genetic markers that can detect and, thus, control for sources of genetic stratification among populations. In an effort to investigate population stratification and identify appropriate marker panels, we have analysed 11,555 single nucleotide polymorphisms in 203 individuals from 12 diverse human populations. Individuals in each population cluster to the exclusion of individuals from other populations using two clustering methods. Higher-order branching and clustering of the populations are consistent with the geographic origins of populations and with previously published genetic analyses. These data provide a valuable resource for the definition of marker panels to detect and control for population stratification in population-based gene identification studies. Using three US resident populations (European-American, African-American and Puerto Rican), we demonstrate how such studies can proceed, quantifying proportional ancestry levels and detecting significant admixture structure in each of these populations.
Bioinformatics | 2003
Wei-min Liu; Xiaojun Di; Geoffrey Yang; Hajime Matsuzaki; Jing Huang; Rui Mei; Thomas B. Ryder; Teresa A. Webster; Shoulian Dong; Guoying Liu; Keith W. Jones; Giulia C. Kennedy; David Kulp
MOTIVATION Analysis of many thousands of single nucleotide polymorphisms (SNPs) across whole genome is crucial to efficiently map disease genes and understanding susceptibility to diseases, drug efficacy and side effects for different populations and individuals. High density oligonucleotide microarrays provide the possibility for such analysis with reasonable cost. Such analysis requires accurate, reliable methods for feature extraction, classification, statistical modeling and filtering. RESULTS We propose the modified partitioning around medoids as a classification method for relative allele signals. We use the average silhouette width, separation and other quantities as quality measures for genotyping classification. We form robust statistical models based on the classification results and use these models to make genotype calls and calculate quality measures of calls. We apply our algorithms to several different genotyping microarrays. We use reference types, informative Mendelian relationship in families, and leave-one-out cross validation to verify our results. The concordance rates with the single base extension reference types are 99.36% for the SNPs on autosomes and 99.64% for the SNPs on sex chromosomes. The concordance of the leave-one-out test is over 99.5% and is 99.9% higher for AA, AB and BB cells. We also provide a method to determine the gender of a sample based on the heterozygous call rate of SNPs on the X chromosome. See http://www.affymetrix.com for further information. The microarray data will also be available from the Affymetrix web site. AVAILABILITY The algorithms will be available commercially in the Affymetrix software package.
Genome Biology | 2009
Hajime Matsuzaki; Pei-Hua Wang; Jing Hu; Rich Rava; Glenn K. Fu
BackgroundCopy number variants (CNVs) account for a large proportion of genetic variation in the genome. The initial discoveries of long (> 100 kb) CNVs in normal healthy individuals were made on BAC arrays and low resolution oligonucleotide arrays. Subsequent studies that used higher resolution microarrays and SNP genotyping arrays detected the presence of large numbers of CNVs that are < 100 kb, with median lengths of approximately 10 kb. More recently, whole genome sequencing of individuals has revealed an abundance of shorter CNVs with lengths < 1 kb.ResultsWe used custom high density oligonucleotide arrays in whole-genome scans at approximately 200-bp resolution, and followed up with a localized CNV typing array at resolutions as close as 10 bp, to confirm regions from the initial genome scans, and to detect the occurrence of sample-level events at shorter CNV regions identified in recent whole-genome sequencing studies. We surveyed 90 Yoruba Nigerians from the HapMap Project, and uncovered approximately 2,700 potentially novel CNVs not previously reported in the literature having a median length of approximately 3 kb. We generated sample-level event calls in the 90 Yoruba at nearly 9,000 regions, including approximately 2,500 regions having a median length of just approximately 200 bp that represent the union of CNVs independently discovered through whole-genome sequencing of two individuals of Western European descent. Event frequencies were noticeably higher at shorter regions < 1 kb compared to longer CNVs (> 1 kb).ConclusionsAs new shorter CNVs are discovered through whole-genome sequencing, high resolution microarrays offer a cost-effective means to detect the occurrence of events at these regions in large numbers of individuals in order to gain biological insights beyond the initial discovery.
Archive | 1997
David J. Lockhart; Mark Chee; Kevin L. Gunderson; Lai Chaoqiang; Lisa Wodicka; Maureen T. Cronin; Danny Lee; Huu M. Tran; Hajime Matsuzaki
Genome Research | 2004
Hajime Matsuzaki; Halina Loi; Shoulian Dong; Ya-Yu Tsai; Joy Fang; Jane Law; Xiaojun Di; Wei-Min Liu; Geoffrey Yang; Guoying Liu; Jing Huang; Giulia C. Kennedy; Thomas B. Ryder; Gregory Marcus; P. Sean Walsh; Mark D. Shriver; Jennifer M. Puck; Keith W. Jones; Rui Mei
Bioinformatics | 2005
Xiaojun Di; Hajime Matsuzaki; Teresa Webster; Earl Hubbell; Guoying Liu; Shoulian Dong; Dan Bartell; Jing Huang; Richard Chiles; Geoffrey Yang; Mei-Mei Shen; David Kulp; Giulia C. Kennedy; Rui Mei; Keith W. Jones; Simon Cawley
Archive | 2003
Hajime Matsuzaki; Rui Mei; Mei-Mei Shen; Giulia C. Kennedy
Archive | 1998
Hajime Matsuzaki; Eric A. Murphy