Is this you? Create Your Porfile

Yongjin Park

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yongjin Park is active.

Explore More

Publication

Featured researches published by Yongjin Park.

Nature Biotechnology | 2014

Functional optimization of gene clusters by combinatorial design and assembly

Michael J. Smanski; Swapnil Bhatia; Dehua Zhao; Yongjin Park; Lauren B.A. Woodruff; Georgia Giannoukos; Dawn Ciulla; Michele Busby; Johnathan Calderon; Robert Nicol; D. Benjamin Gordon; Douglas Densmore; Christopher A. Voigt

Large microbial gene clusters encode useful functions, including energy utilization and natural product biosynthesis, but genetic manipulation of such systems is slow, difficult and complicated by complex regulation. We exploit the modularity of a refactored Klebsiella oxytoca nitrogen fixation (nif) gene cluster (16 genes, 103 parts) to build genetic permutations that could not be achieved by starting from the wild-type cluster. Constraint-based combinatorial design and DNA assembly are used to build libraries of radically different cluster architectures by varying part choice, gene order, gene orientation and operon occupancy. We construct 84 variants of the nifUSVWZM operon, 145 variants of the nifHDKY operon, 155 variants of the nifHDKYENJ operon and 122 variants of the complete 16-gene pathway. The performance and behavior of these variants are characterized by nitrogenase assay and strand-specific RNA sequencing (RNA-seq), and the results are incorporated into subsequent design cycles. We have produced a fully synthetic cluster that recovers 57% of wild-type activity. Our approach allows the performance of genetic parts to be quantified simultaneously in hundreds of genetic contexts. This parallelized design-build-test-learn cycle, which can access previously unattainable regions of genetic space, should provide a useful, fast tool for genetic optimization and hypothesis testing.

Nature Biotechnology | 2015

Deep learning for regulatory genomics

Yongjin Park; Manolis Kellis

Computational modeling of DNA and RNA targets of regulatory proteins is improved by a deep-learning approach.

Nature Genetics | 2017

Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease

Barbara E. Stranger; Lori E. Brigham; Richard Hasz; Marcus Hunter; Christopher Johns; Mark C. Johnson; Gene Kopen; William F. Leinweber; John T. Lonsdale; Alisa McDonald; Bernadette Mestichelli; Kevin Myer; Brian Roe; Michael Salvatore; Saboor Shad; Jeffrey A. Thomas; Gary Walters; Michael Washington; Joseph Wheeler; Jason Bridge; Barbara A. Foster; Bryan M. Gillard; Ellen Karasik; Rachna Kumar; Mark Miklos; Michael T. Moser; Scott Jewell; Robert G. Montroy; Daniel C. Rohrer; Dana R. Valley

Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variants cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.

Molecular Systems Biology | 2017

Genetic circuit characterization and debugging using RNA‐seq

Thomas E. Gorochowski; Amin Espah Borujeni; Yongjin Park; Alec A. K. Nielsen; Jing Zhang; Bryan S. Der; D. Benjamin Gordon; Christopher A. Voigt

Genetic circuits implement computational operations within a cell. Debugging them is difficult because their function is defined by multiple states (e.g., combinations of inputs) that vary in time. Here, we develop RNA‐seq methods that enable the simultaneous measurement of: (i) the states of internal gates, (ii) part performance (promoters, insulators, terminators), and (iii) impact on host gene expression. This is applied to a three‐input one‐output circuit consisting of three sensors, five NOR/NOT gates, and 46 genetic parts. Transcription profiles are obtained for all eight combinations of inputs, from which biophysical models can extract part activities and the response functions of sensors and gates. Various unexpected failure modes are identified, including cryptic antisense promoters, terminator failure, and a sensor malfunction due to media‐induced changes in host gene expression. This can guide the selection of new parts to fix these problems, which we demonstrate by using a bidirectional terminator to disrupt observed antisense transcription. This work introduces RNA‐seq as a powerful method for circuit characterization and debugging that overcomes the limitations of fluorescent reporters and scales to large systems composed of many parts.

bioRxiv | 2017

Multi-tissue polygenic models for transcriptome-wide association studies

Yongjin Park; Abhishek Sarkar; Kunal Bhutani; Manolis Kellis

Transcriptome-wide association studies (TWAS) have proven to be a powerful tool to identify genes associated with human diseases by aggregating cis-regulatory effects on gene expression. However, TWAS relies on building predictive models of gene expression, which are sensitive to the sample size and tissue on which they are trained. The Gene Tissue Expression Project has produced reference transcriptomes across 53 human tissues and cell types; however, the data is highly sparse, making it difficult to build polygenic models in relevant tissues for TWAS. Here, we propose fQTL, a multi-tissue, multivariate model for mapping expression quantitative trait loci and predicting gene expression. Our model decomposes eQTL effects into SNP-specific and tissue-specific components, pooling information across relevant tissues to effectively boost sample sizes. In simulation, we demonstrate that our multi-tissue approach outperforms single-tissue approaches in identifying causal eQTLs and tissues of action. Using our method, we fit polygenic models for 13,461 genes, characterized the tissue-specificity of the learned cis-eQTLs, and performed TWAS for Alzheimer’s disease and schizophrenia, identifying 107 and 382 associated genes, respectively.

bioRxiv | 2018

Genes with high network connectivity are enriched for disease heritability

Samuel S Kim; Chengzhen Dai; Farhad Hormozdiari; Bryce van de Geijn; Steven Gazal; Yongjin Park; Luke O'Connor; Tiffany Amariuta; Po-Ru Loh; Hilary Finucane; Soumya Raychaudhuri; Alkes L. Price

Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 independent diseases and complex traits (average N=323K) to identify enriched annotations. First, we constructed annotations from 18,119 biological pathways, including 100kb windows around each gene. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and on annotations from the baseline-LD model, a stringent step that greatly reduced the number of pathways detected; most of the significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity using closeness centrality, a measure of how close a gene is to other genes in the network. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, such that accounting for known annotations is critical to robust inference of biological mechanisms.

bioRxiv | 2017

Modeling prediction error improves power of transcriptome-wide association studies

Kunal Bhutani; Abhishek Sarkar; Yongjin Park; Manolis Kellis; Nicholas J. Schork

Transcriptome-wide association studies (TWAS) test for associations between imputed gene expression levels and phenotypes in GWAS cohorts using models of transcriptional regulation learned from reference transcriptomes. However, current methods for TWAS only use point estimates of imputed expression and ignore uncertainty in the prediction. We develop a novel two-stage Bayesian regression method which incorporates uncertainty in imputed gene expression and achieves higher power to detect TWAS genes than existing TWAS methods as well as standard methods based on missing value and measurement error theory. We apply our method to GTEx whole blood transcriptomes and GWAS cohorts for seven diseases from the Wellcome Trust Case Control Consortium and find 45 TWAS genes, of which 17 do not overlap previously reported case-control GWAS or differential expression associations. Surprisingly, we replicate only 2 of 40 previously reported TWAS genes after accounting for uncertainty in the prediction.

bioRxiv | 2017

Causal gene inference by multivariate mediation analysis in Alzheimer's disease

Yongjin Park; Abhishek Sarkar; Liang He; Jose Davila-Velderrain; Philip L. De Jager; Manolis Kellis

Characterizing the intermediate phenotypes, such as gene expression, that mediate genetic effects on complex diseases is a fundamental problem in human genetics. Existing methods utilize genotypic data and summary statistics to identify putative disease genes, but cannot distinguish pleiotropy from causal mediation and are limited by overly strong assumptions about the data. To overcome these limitations, we develop Causal Multivariate Mediation within Extended Linkage disequilibrium (CaMMEL), a novel Bayesian inference framework to jointly model multiple mediated and unmediated effects relying only on summary statistics. We show in simulation that CaMMEL accurately distinguishes between mediating and pleiotropic genes unlike existing methods. We applied CaMMEL to Alzheimer’s disease (AD) and found 206 causal genes in sub-threshold loci (p < 10−4). We prioritized 21 genes which mediate at least 5% of local genetic variance, disrupting innate immune pathways in AD.Characterizing the intermediate phenotypes such as gene expression that mediate genetic effects on complex diseases is a fundamental problem in human genetics. Existing methods based on imputation of transcriptomic data by utilizing genotypic data and summary statistics to identify putative disease genes cannot distinguish pleiotropy from causal mediation and are limited by overly strong assumptions about the data. To overcome these limitations, we develop a novel Bayesian inference framework, termed Causal Multivariate Mediation within Extended Linkage disequilibrium (CaMMEL), to jointly model multiple mediated and unmediated effects, relying only on summary statistics. We show in simulation unlike existing methods fail to distinguish between mediation and independent direct effects, CaMMEL accurately estimates mediation genes, clearly excluding pleiotropic genes. We applied our method to Alzheimer9s disease (AD) and found 21 genes in sub-threshold loci (p

Explore More