Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Soohyun Lee is active.

Publication


Featured researches published by Soohyun Lee.


Nature | 2014

Comparative analysis of metazoan chromatin organization

Joshua W. K. Ho; Youngsook L. Jung; Tao Liu; Burak H. Alver; Soohyun Lee; Kohta Ikegami; Kyung Ah Sohn; Aki Minoda; Michael Y. Tolstorukov; Alex Appert; Stephen C. J. Parker; Tingting Gu; Anshul Kundaje; Nicole C. Riddle; Eric P. Bishop; Thea A. Egelhofer; Sheng'En Shawn Hu; Artyom A. Alekseyenko; Andreas Rechtsteiner; Dalal Asker; Jason A. Belsky; Sarah K. Bowman; Q. Brent Chen; Ron Chen; Daniel S. Day; Yan Dong; Andréa C. Dosé; Xikun Duan; Charles B. Epstein; Sevinc Ercan

Genome function is dynamically regulated in part by chromatin, which consists of the histones, non-histone proteins and RNA molecules that package DNA. Studies in Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular mechanisms of genome function in humans, and have revealed conservation of chromatin components and mechanisms. Nevertheless, the three organisms have markedly different genome sizes, chromosome architecture and gene organization. On human and fly chromosomes, for example, pericentric heterochromatin flanks single centromeres, whereas worm chromosomes have dispersed heterochromatin-like regions enriched in the distal chromosomal ‘arms’, and centromeres distributed along their lengths. To systematically investigate chromatin organization and associated gene regulation across species, we generated and analysed a large collection of genome-wide chromatin data sets from cell lines and developmental stages in worm, fly and human. Here we present over 800 new data sets from our ENCODE and modENCODE consortia, bringing the total to over 1,400. Comparison of combinatorial patterns of histone modifications, nuclear lamina-associated domains, organization of large-scale topological domains, chromatin environment at promoters and enhancers, nucleosome positioning, and DNA replication patterns reveals many conserved features of chromatin organization among the three organisms. We also find notable differences in the composition and locations of repressive chromatin. These data sets and analyses provide a rich resource for comparative and species-specific investigations of chromatin composition, organization and function.


Science | 2015

Somatic mutation in single human neurons tracks developmental and transcriptional history

Michael A. Lodato; Mollie B. Woodworth; Semin Lee; Gilad D. Evrony; Bhaven K. Mehta; Amir Karger; Soohyun Lee; Thomas Chittenden; Alissa M. D’Gama; Xuyu Cai; Lovelace J. Luquette; Eunjung Lee; Peter J. Park; Christopher A. Walsh

Individualized neuronal mutations in the human brain The neurons of the human brain can last for decades, carrying out computational and signaling functions. Lodato et al. analyzed the DNA of individual neurons sampled from postmortem human brains and found that individual neurons acquired somatic mutations (see the Perspective by Linnarsson). The mechanism of mutation involved gene transcription rather than DNA replication. Thus, postmitotic neurons would seem to be their own worst enemy: Genes used for neuronal function are the very genes put most at risk of somatic mutation. Science, this issue p. 94; see also p. 37 Human brains are built from intermingled clones of cells that carry mutations linked to their use of particular neuronal genes. [Also see Perspective by Linnarsson] Neurons live for decades in a postmitotic state, their genomes susceptible to DNA damage. Here we survey the landscape of somatic single-nucleotide variants (SNVs) in the human brain. We identified thousands of somatic SNVs by single-cell sequencing of 36 neurons from the cerebral cortex of three normal individuals. Unlike germline and cancer SNVs, which are often caused by errors in DNA replication, neuronal mutations appear to reflect damage during active transcription. Somatic mutations create nested lineage trees, allowing them to be dated relative to developmental landmarks and revealing a polyclonal architecture of the human cerebral cortex. Thus, somatic mutations in the brain represent a durable and ongoing record of neuronal life history, from development through postmitotic function.


Nature | 2015

Hallmarks of pluripotency

Alejandro De Los Angeles; Francesco Ferrari; Ruibin Xi; Yuko Fujiwara; Nissim Benvenisty; Hongkui Deng; Rudolf Jaenisch; Soohyun Lee; Harry G. Leitch; M. William Lensch; Ernesto Lujan; Duanqing Pei; Janet Rossant; Marius Wernig; Peter J. Park; George Q. Daley

Stem cells self-renew and generate specialized progeny through differentiation, but vary in the range of cells and tissues they generate, a property called developmental potency. Pluripotent stem cells produce all cells of an organism, while multipotent or unipotent stem cells regenerate only specific lineages or tissues. Defining stem-cell potency relies upon functional assays and diagnostic transcriptional, epigenetic and metabolic states. Here we describe functional and molecular hallmarks of pluripotent stem cells, propose a checklist for their evaluation, and illustrate how forensic genomics can validate their provenance.


Nature Biotechnology | 2015

A comparison of genetically matched cell lines reveals the equivalence of human iPSCs and ESCs.

Jiho Choi; Soohyun Lee; William Mallard; Kendell Clement; Guidantonio Malagoli Tagliazucchi; Hotae Lim; In Young Choi; Francesco Ferrari; Alexander M. Tsankov; Ramona Pop; Gabsang Lee; John L. Rinn; Alexander Meissner; Peter J. Park

The equivalence of human induced pluripotent stem cells (hiPSCs) and human embryonic stem cells (hESCs) remains controversial. Here we use genetically matched hESC and hiPSC lines to assess the contribution of cellular origin (hESC vs. hiPSC), the Sendai virus (SeV) reprogramming method and genetic background to transcriptional and DNA methylation patterns while controlling for cell line clonality and sex. We find that transcriptional and epigenetic variation originating from genetic background dominates over variation due to cellular origin or SeV infection. Moreover, the 49 differentially expressed genes we detect between genetically matched hESCs and hiPSCs neither predict functional outcome nor distinguish an independently derived, larger set of unmatched hESC and hiPSC lines. We conclude that hESCs and hiPSCs are molecularly and functionally equivalent and cannot be distinguished by a consistent gene expression signature. Our data further imply that genetic background variation is a major confounding factor for transcriptional and epigenetic comparisons of pluripotent cell lines, explaining some of the previously observed differences between genetically unmatched hESCs and hiPSCs.


Nature | 2015

Failure to replicate the STAP cell phenomenon.

Alejandro De Los Angeles; Francesco Ferrari; Yuko Fujiwara; Ronald Mathieu; Soohyun Lee; Semin Lee; Ho-Chou Tu; Samantha J. Ross; Stephanie S. Chou; Minh Nguyen; Zhaoting Wu; Thorold W. Theunissen; Benjamin E. Powell; Sumeth Imsoonthornruksa; Jiekai Chen; Marti Borkent; Vladislav Krupalnik; Ernesto Lujan; Marius Wernig; Jacob Hanna; Duanqing Pei; Rudolf Jaenisch; Hongkui Deng; Stuart H. Orkin; Peter J. Park; George Q. Daley

Although the reports that stress (such as exposure to acid) can coax somatic cells into a novel state of pluripotency have been retracted, the validity of stimulus-triggered acquisition of pluripotency (STAP) remains unclear (http://dx.doi.org/10.1038/protex. 2014.008 and Supplementary Information). Here we describe the efforts of seven laboratories to replicate STAP, including experiments performed within the laboratory where STAP first originated, as well as re-analysis of the sequencing data from the STAP reports. Neonatal cells treated with two STAP protocols exhibited artefactual autofluoresence rather than bona fide reactivation of an Oct4 (also known as Pou5f1) and green fluorescent protein (GFP) transgene reporter, did not reactivate pluripotency markers towards embryonic stem (ES)-cell-like levels, and failed to generate teratomas or chimaerize blastocysts. Re-analysis of the original RNA sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) data identified discrepancies in the sex and genetic composition of parental donor cells and converted stem cells, and revealed a STAP-derived cell line to be a mixture containing trophoblast stem cells, attesting to the importance of validating the properties and provenance of pluripotent stem cells using a wide range of criteria. To assess the reprogramming capacity of STAP protocols, we used a transgenic Oct4-GFP reporter, which shows GFP reactivation during Oct4/Sox2/Klf4 reprogramming, in established induced pluripotent stem (iPS) cells and in the gonads of mid-gestation ‘all iPS cell’ embryos generated by tetraploid complementation (Extended Data Figs 1 and 2a). Working within the Vacanti laboratory where the concept of STAP cells originated, and assisted by a co-author of the STAP papers, a Daley laboratory member (A.D.L.A.) attempted to replicate two reported STAP protocols: (1) mechanical trituration and acid treatment of mouse lung cells (Brigham and Women’s Hospital (BWH) protocol; see Supplementary Information), and (2) acid treatment of mouse splenocytes (RIKEN protocol; Methods and Extended Data Fig. 2b). Seventy-two hours after stress treatment of lung cells, floating spheres appeared amidst cellular debris. Fluorescence microscopy revealed that both Oct4-GFP and wild-type spheres emitted lowlevel broad spectrum fluorescence detectable within both green and red filters, indicating autofluorescence (Fig. 1a). Untreated Oct4-GFP ES cells did not emit the same low-level broad spectrum fluorescence as STAP-treated cells. STAP-treated splenocytes formed spheres with lower efficiency, but also appeared autofluorescent. Flow cytometry indicated STAP-treated Oct4-GFP cells did not exhibit Oct4-GFP reactivation at levels comparable to control Oct4GFP mouse ES cells, and were indistinguishable from stressed wildtype controls (Fig. 1b). Absence of ES-cell-like levels of Oct4, Sox2 and Nanog transcripts and nonspecific immunofluorescence corroborated flow cytometry data (Extended Data Fig. 2c, d). Rare pluripotent cells should generate teratomas in immunocompromised mice, but STAP cells could not, unlike control ES cells (Extended Data Fig. 2e, f). Replication of the poly-L-glycolic acid (PLGA)-based teratoma production method described in the original STAP reports with GFP cells to distinguish host and donor contribution produced distinct masses of connective tissue, muscle and scar, with minimal GFP content, indicating primarily host origin (Fig. 1c, d and Extended Data Fig. 2g). Rare GFP-positive clusters did not form differentiated tissues characteristic of ES-cell-derived teratomas (Fig. 1d). Autofluorescent spheres failed to enter development after morula aggregation or blastocyst injection (Fig. 1e and Extended Data Fig. 2h–j). Therefore, pluripotency was undetectable in STAP experiments. Six other laboratories (Deng, Hanna, Hochedlinger, Jaenisch, Pei and Wernig) also attempted to generate STAP cells (Table 1) and made the following observations. First, autofluorescent sphere-like aggregates after STAP treatment were universally seen. Second, transgenic reporters used by Obokata and colleagues (GOF18-Oct4-GFP, containing the 18-kilobase genomic Oct4 fragment (GOF18)) and by the Daley, Pei and Hanna laboratories (GOF18-Oct4DPE-GFP, lacking the Oct4 proximal enhancer (PE) element) both exhibit activity in pre-implantation embryos, early post-implantation epiblast cells (embryonic day (E) 5.5), germ cells, and mouse ES/iPS cells; however, differential activity in late post-implantation epiblast (E6.5) and early passage mouse epiblast-derived stem cells has been ascribed to the Oct4 proximal enhancer. Using the same reporter as Obokata and colleagues, the Deng laboratory observed that the GFP signal in chemical iPS cells was easily distinguishable from the autofluorescence of STAP-treated cells (Extended Data Fig. 2k). The Jaenisch, Wernig and Hochedlinger laboratories failed to observe GFP reactivation with Oct4 or Nanog knock-in reporters, excluding a scenario of uncoupling between GFP and endogenous pluripotency expression. Despite a range of tested reporters, no group documented authentic Oct4/Nanog reporter activation that resembled bona fide ES cells. Third, the Deng laboratory failed to observe Oct4, Sox2 and Nanog induction 3 and 7 days after STAP treatment, reducing the likelihood that pluripotency was transiently activated and silenced by day 7 (Extended Data Fig. 2l). Finally, the Hanna, Wernig and Hochedlinger laboratories failed to generate stem-cell lines by culturing STAP-treated cells in leukaemia inhibitory factor (LIF) and adrenocorticotropic hormone (ACTH)-supplemented medium. In summary, 133 replicate attempts failed to document generation of ES-cell-like cells, corroborating and extending a recent report. We re-examined the high-throughput sequencing data from the STAP reports to investigate the genetic provenance of parental CD45 cells and converted STAP cells, STAP stem cells and Fgf4-induced stem cells (FI-SCs) (Fig. 1f). Comparative genomic hybridization array data mentioned in the original paper were not publicly released. Copy number variation (CNV) analysis conducted using ChIP-seq input samples revealed a discrepancy in sex across samples as well as chromosomal aberrations (Fig. 1g). In the original STAP reports, the authors stated that they mixed CD45 cells from male and female mice owing to the small number of CD45 cells retrieved from individual neonatal spleens. However, our analysis indicates that CD45 cells were female, whereas the derived cells (STAP cells, STAP stem cells and FI-SCs) were all male, a clear inconsistency. We note that control ES cells were also male (Fig. 1g). FI-SCs possessed trisomy 8, which renders mouse ES cells germline-incompetent (Fig. 1g). Inferred single nucleotide variants (SNVs) from RNA-seq data allowed classification of samples as genetically similar or dissimilar (Fig. 1h). Control ES cells, parental donor female CD45 cells, STAP cells, and STAP stem cells all possessed similar SNV profiles, consistent with their derivation from a first generation hybrid of C57BL6/129 strains, the reported genotype (Fig. 1h and Extended Data Fig. 3). By contrast, FI-SCs had an SNV profile that matched a single nucleotide polymorphism (SNP) profile of C57BL6 strain origin, indicating


Cell Stem Cell | 2017

DUSP9 Modulates DNA Hypomethylation in Female Mouse Pluripotent Stem Cells

Jiho Choi; Kendell Clement; Aaron J. Huebner; Jamie Webster; Christopher M. Rose; Justin Brumbaugh; Ryan M. Walsh; Soohyun Lee; Andrej J. Savol; Jean-Pierre Etchegaray; Hongcang Gu; Patrick Boyle; Ulrich Elling; Raul Mostoslavsky; Ruslan I. Sadreyev; Peter J. Park; Steven P. Gygi; Alexander Meissner

Blastocyst-derived embryonic stem cells (ESCs) and gonad-derived embryonic germ cells (EGCs) represent two classic types of pluripotent cell lines, yet their molecular equivalence remains incompletely understood. Here, we compare genome-wide methylation patterns between isogenic ESC and EGC lines to define epigenetic similarities and differences. Surprisingly, we find that sex rather than cell type drives methylation patterns in ESCs and EGCs. Cell fusion experiments further reveal that the ratio of X chromosomes to autosomes dictates methylation levels, with female hybrids being hypomethylated and male hybrids being hypermethylated. We show that the X-linked MAPK phosphatase DUSP9 is upregulated in female compared to male ESCs, and its heterozygous loss in female ESCs leads to male-like methylation levels. However, male and female blastocysts are similarly hypomethylated, indicating that sex-specific methylation differences arise in culture. Collectively, our data demonstrate the epigenetic similarity of sex-matched ESCs and EGCs and identify DUSP9 as a regulator of female-specific hypomethylation.


BMC Bioinformatics | 2015

EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering

Soohyun Lee; Chae Hwa Seo; Burak H. Alver; Sanghyuk Lee; Peter J. Park

BackgroundRNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost.ResultsWe introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods.ConclusionsEMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar


Genome Biology | 2018

HiGlass: web-based visual exploration and analysis of genome interaction maps

Peter Kerpedjiev; Nezar Abdennur; Fritz Lekschas; Chuck McCallum; Kasper Dinkla; Hendrik Strobelt; Jacob M. Luber; Scott Ouellette; Alaleh Azhir; Nikhil Kumar; Jeewon Hwang; Soohyun Lee; Burak H. Alver; Hanspeter Pfister; Leonid A. Mirny; Peter J. Park; Nils Gehlenborg

We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform.


Nucleic Acids Research | 2017

NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types

Sejoon Lee; Soohyun Lee; Scott Ouellette; Woong-Yang Park; Eunjung Lee; Peter J. Park

Abstract In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. Availability: https://github.com/parklab/NGSCheckMate


bioRxiv | 2018

Tibanna: software for scalable execution of portable pipelines on the cloud

Soohyun Lee; Jeremy Johnson; Carl Vitzthum; Koray Kırlı; Burak H. Alver; Peter J. Park

Summary We introduce Tibanna, an open-source software tool for automated execution of bioinformatics pipelines on Amazon Web Services (AWS). Tibanna accepts reproducible and portable pipeline standards including Common Workflow Language (CWL), Workflow Description Language (WDL) and Docker. It adopts a strategy of isolation and optimization of individual executions, combined with a serverless scheduling approach. Pipelines are executed and monitored using local commands or the Python Application Programming Interface (API) and cloud configuration is automatically handled. Tibanna is well suited for projects with a range of computational requirements, including those with large and widely fluctuating loads. Notably, it has been used to process terabytes of data for the 4D Nucleome (4DN) Network. Availability Source code is available on GitHub at https://github.com/4dn-dcic/tibanna.

Collaboration


Dive into the Soohyun Lee's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rudolf Jaenisch

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge