Masao Nagasaki
Tohoku University
Nature Communications | 2015
Masao Nagasaki; Jun Yasuda; Fumiki Katsuoka; Naoki Nariai; Kaname Kojima; Yosuke Kawai; Yumi Yamaguchi-Kabata; Junji Yokozawa; Inaho Danjoh; Sakae Saito; Yukuto Sato; Takahiro Mimori; Kaoru Tsuda; Rumiko Saito; Xiaoqing Pan; Satoshi Nishikawa; Shin Ito; Yoko Kuroki; Osamu Tanabe; Nobuo Fuse; Shinichi Kuriyama; Hideyasu Kiyomoto; Atsushi Hozawa; Naoko Minegishi; James Douglas Engel; Kengo Kinoshita; Shigeo Kure; Nobuo Yaegashi; Akito Tsuboi; Fuji Nagami
The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies.
PLOS Genetics | 2016
Masakazu Kohda; Yoshimi Tokuzawa; Yoshihito Kishita; Hiromi Nyuzuki; Yohsuke Moriyama; Yosuke Mizuno; Tomoko Hirata; Yukiko Yatsuka; Yzumi Yamashita-Sugahara; Yutaka Nakachi; Hidemasa Kato; Akihiko Okuda; Shunsuke Tamaru; Nurun Nahar Borna; Kengo Banshoya; Toshiro Aigaki; Yukiko Sato-Miyata; Kohei Ohnuma; Tsutomu Suzuki; Asuteka Nagao; Hazuki Maehata; Fumihiko Matsuda; Koichiro Higasa; Masao Nagasaki; Jun Yasuda; Masayuki Yamamoto; Takuya Fushimi; Masaru Shimura; Keiko Kaiho-Ichimoto; Hiroko Harashima
Mitochondrial disorders have the highest incidence among congenital metabolic disorders characterized by biochemical respiratory chain complex deficiencies. It occurs at a rate of 1 in 5,000 births, and has phenotypic and genetic heterogeneity. Mutations in about 1,500 nuclear encoded mitochondrial proteins may cause mitochondrial dysfunction of energy production and mitochondrial disorders. More than 250 genes that cause mitochondrial disorders have been reported to date. However exact genetic diagnosis for patients still remained largely unknown. To reveal this heterogeneity, we performed comprehensive genomic analyses for 142 patients with childhood-onset mitochondrial respiratory chain complex deficiencies. The approach includes whole mtDNA and exome analyses using high-throughput sequencing, and chromosomal aberration analyses using high-density oligonucleotide arrays. We identified 37 novel mutations in known mitochondrial disease genes and 3 mitochondria-related genes (MRPS23, QRSL1, and PNPLA4) as novel causative genes. We also identified 2 genes known to cause monogenic diseases (MECP2 and TNNI3) and 3 chromosomal aberrations (6q24.3-q25.1, 17p12, and 22q11.21) as causes in this cohort. Our approaches enhance the ability to identify pathogenic gene mutations in patients with biochemically defined mitochondrial respiratory chain complex deficiencies in clinical settings. They also underscore clinical and genetic heterogeneity and will improve patient care of this complex disorder.
Proceedings of the National Academy of Sciences of the United States of America | 2012
Hideto Koso; Haruna Takeda; Christopher Chin Kuan Yew; Jerrold M. Ward; Naoki Nariai; Kazuko Ueno; Masao Nagasaki; Sumiko Watanabe; Alistair G. Rust; David J. Adams; Neal G. Copeland; Nancy A. Jenkins
Neural stem cells (NSCs) are considered to be the cell of origin of glioblastoma multiforme (GBM). However, the genetic alterations that transform NSCs into glioma-initiating cells remain elusive. Using a unique transposon mutagenesis strategy that mutagenizes NSCs in culture, followed by additional rounds of mutagenesis to generate tumors in vivo, we have identified genes and signaling pathways that can transform NSCs into glioma-initiating cells. Mobilization of Sleeping Beauty transposons in NSCs induced the immortalization of astroglial-like cells, which were then able to generate tumors with characteristics of the mesenchymal subtype of GBM on transplantation, consistent with a potential astroglial origin for mesenchymal GBM. Sequence analysis of transposon insertion sites from tumors and immortalized cells identified more than 200 frequently mutated genes, including human GBM-associated genes, such as Met and Nf1, and made it possible to discriminate between genes that function during astroglial immortalization vs. later stages of tumor development. We also functionally validated five GBM candidate genes using a previously undescribed high-throughput method. Finally, we show that even clonally related tumors derived from the same immortalized line have acquired distinct combinations of genetic alterations during tumor development, suggesting that tumor formation in this model system involves competition among genetically variant cells, which is similar to the Darwinian evolutionary processes now thought to generate many human cancers. This mutagenesis strategy is faster and simpler than conventional transposon screens and can potentially be applied to any tissue stem/progenitor cells that can be grown and differentiated in vitro.
Journal of Epidemiology | 2016
Shinichi Kuriyama; Nobuo Yaegashi; Fuji Nagami; Tomohiko Arai; Yoshio Kawaguchi; Noriko Osumi; Masaki Sakaida; Yoichi Suzuki; Keiko Nakayama; Hiroaki Hashizume; Gen Tamiya; Hiroshi Kawame; Kichiya Suzuki; Atsushi Hozawa; Naoki Nakaya; Masahiro Kikuya; Hirohito Metoki; Ichiro Tsuji; Nobuo Fuse; Hideyasu Kiyomoto; Junichi Sugawara; Akito Tsuboi; Shinichi Egawa; Kiyoshi Ito; Koichi Chida; Tadashi Ishii; Hiroaki Tomita; Yasuyuki Taki; Naoko Minegishi; Naoto Ishii
The Great East Japan Earthquake (GEJE) and resulting tsunami of March 11, 2011 gave rise to devastating damage on the Pacific coast of the Tohoku region. The Tohoku Medical Megabank Project (TMM), which is being conducted by Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Medical University Iwate Tohoku Medical Megabank Organization (IMM), has been launched to realize creative reconstruction and to solve medical problems in the aftermath of this disaster. We started two prospective cohort studies in Miyagi and Iwate Prefectures: a population-based adult cohort study, the TMM Community-Based Cohort Study (TMM CommCohort Study), which will recruit 80 000 participants, and a birth and three-generation cohort study, the TMM Birth and Three-Generation Cohort Study (TMM BirThree Cohort Study), which will recruit 70 000 participants, including fetuses and their parents, siblings, grandparents, and extended family members. The TMM CommCohort Study will recruit participants from 2013 to 2016 and follow them for at least 5 years. The TMM BirThree Cohort Study will recruit participants from 2013 to 2017 and follow them for at least 4 years. For children, the ToMMo Child Health Study, which adopted a cross-sectional design, was also started in November 2012 in Miyagi Prefecture. An integrated biobank will be constructed based on the two prospective cohort studies, and ToMMo and IMM will investigate the chronic medical impacts of the GEJE. The integrated biobank of TMM consists of health and clinical information, biospecimens, and genome and omics data. The biobank aims to establish a firm basis for personalized healthcare and medicine, mainly for diseases aggravated by the GEJE in the two prefectures. Biospecimens and related information in the biobank will be distributed to the research community. TMM itself will also undertake genomic and omics research. The aims of the genomic studies are: 1) to construct an integrated biobank; 2) to return genomic research results to the participants of the cohort studies, which will lead to the implementation of personalized healthcare and medicine in the affected areas in the near future; and 3) to contribute the development of personalized healthcare and medicine worldwide. Through the activities of TMM, we will clarify how to approach prolonged healthcare problems in areas damaged by large-scale disasters and how useful genomic information is for disease prevention.
BMC Genomics | 2014
Naoki Nariai; Kaname Kojima; Takahiro Mimori; Yukuto Sato; Yosuke Kawai; Yumi Yamaguchi-Kabata; Masao Nagasaki
BackgroundHigh-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.g. > 250 bp).ResultsWe propose TIGAR2, a statistical method for quantifying transcript isoforms from fixed and variable length RNA-Seq data. Our method models substitution, deletion, and insertion errors of sequencers based on gapped-alignments of reads to the reference cDNA sequences so that sensitive read-aligners such as Bowtie2 and BWA-MEM are effectively incorporated in our pipeline. Also, a heuristic algorithm is implemented in variational Bayesian inference for faster computation. We apply TIGAR2 to both simulation data and real data of human samples and evaluate performance of transcript quantification with TIGAR2 in comparison to existing methods.ConclusionsTIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data. Our method performs better than existing methods for the fixed-length reads (100 bp, 250 bp, 500 bp, and 1000 bp of both single-end and paired-end) and variable-length reads, especially for reads longer than 250 bp.
Bioinformatics | 2013
Naoki Nariai; Osamu Hirose; Kaname Kojima; Masao Nagasaki
MOTIVATIONnMany human genes express multiple transcript isoforms through alternative splicing, which greatly increases diversity of protein function. Although RNA sequencing (RNA-Seq) technologies have been widely used in measuring amounts of transcribed mRNA, accurate estimation of transcript isoform abundances from RNA-Seq data is challenging because reads often map to more than one transcript isoforms or paralogs whose sequences are similar to each other.nnnRESULTSnWe propose a statistical method to estimate transcript isoform abundances from RNA-Seq data. Our method can handle gapped alignments of reads against reference sequences so that it allows insertion or deletion errors within reads. The proposed method optimizes the number of transcript isoforms by variational Bayesian inference through an iterative procedure, and its convergence is guaranteed under a stopping criterion. On simulated datasets, our method outperformed the comparable quantification methods in inferring transcript isoform abundances, and at the same time its rate of convergence was faster than that of the expectation maximization algorithm. We also applied our method to RNA-Seq data of human cell line samples, and showed that our prediction result was more consistent among technical replicates than those of other methods.nnnAVAILABILITYnAn implementation of our method is available at INFORMATIONnSupplementary data are available at Bioinformatics online.
Human genome variation | 2015
Yumi Yamaguchi-Kabata; Naoki Nariai; Yosuke Kawai; Yukuto Sato; Kaname Kojima; Minoru Tateno; Fumiki Katsuoka; Jun Yasuda; Masayuki Yamamoto; Masao Nagasaki
The integrative Japanese Genome Variation Database (iJGVD; provides genomic variation data detected by whole-genome sequencing (WGS) of Japanese individuals. Specifically, the database contains variants detected by WGS of 1,070 individuals who participated in a genome cohort study of the Tohoku Medical Megabank Project. In the first release, iJGVD includes >4,300,000 autosomal single nucleotide variants (SNVs) whose minor allele frequencies are >5.0%.
BMC Genomics | 2015
Naoki Nariai; Kaname Kojima; Sakae Saito; Takahiro Mimori; Yukuto Sato; Yosuke Kawai; Yumi Yamaguchi-Kabata; Jun Yasuda; Masao Nagasaki
BackgroundHuman leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data.ResultsWe have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimizes read alignments to HLA allele sequences and abundance of reads on HLA alleles by variational Bayesian inference. We show the effectiveness of the proposed method over other methods through the analysis of predicting HLA types for HLA class I (HLA-A, -B and -C) and class II (HLA-DQA1,-DQB1 and -DRB1) loci from the simulation data of various depth of coverage, and real sequencing data of human trio samples.ConclusionsHLA-VBSeq is an efficient and accurate HLA typing method using high-throughput sequencing data without the need of primer design for HLA loci. Moreover, it does not assume any prior knowledge about HLA allele frequencies, and hence HLA-VBSeq is broadly applicable to human samples obtained from a genetically diverse population.
Journal of Human Genetics | 2015
Yosuke Kawai; Takahiro Mimori; Kaname Kojima; Naoki Nariai; Inaho Danjoh; Rumiko Saito; Jun Yasuda; Masayuki Yamamoto; Masao Nagasaki
The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659u2009253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r2>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%<MAF⩽5%) of the Japonica array reached 67.2%, which is higher than those of the existing arrays. In addition, we confirmed the high quality genotyping performance of the Japonica array using the 288 samples in 1KJPN; the average call rate 99.7% and the average concordance rate 99.7% to the genotypes obtained from high-throughput sequencer. As demonstrated in this study, the creation of custom-made SNP arrays based on a population-specific reference panel is a practical way to facilitate further association studies through genome-wide genotype imputations.
Analytical Biochemistry | 2014
Fumiki Katsuoka; Junji Yokozawa; Kaoru Tsuda; Shin Ito; Xiaoqing Pan; Masao Nagasaki; Jun Yasuda; Masayuki Yamamoto
Library quantitation is a critical step to obtain high data output in Illumina HiSeq sequencers. Here, we introduce a library quantitation method that uses the Illumina MiSeq sequencer designated as quantitative MiSeq (qMiSeq). In this procedure, 96 dual-index libraries, including control samples, are denatured, pooled in equal volume, and sequenced by MiSeq. We found that relative concentration of each library can be determined based on the observed index ratio and can be used to determine HiSeq run condition for each library. Thus, qMiSeq provides an efficient way to quantitate a large number of libraries at a time.