Su-In Lee
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Su-In Lee.
Nature | 2005
James E. Galagan; Sarah E. Calvo; Christina A. Cuomo; Li-Jun Ma; Jennifer R. Wortman; Serafim Batzoglou; Su-In Lee; Meray Baştürkmen; Christina C. Spevak; John Clutterbuck; Vladimir V. Kapitonov; Jerzy Jurka; Claudio Scazzocchio; Mark L. Farman; Jonathan Butler; Seth Purcell; Steve Harris; Gerhard H. Braus; Oliver W. Draht; Silke Busch; Christophe d'Enfert; Christiane Bouchier; Gustavo H. Goldman; Deborah Bell-Pedersen; Sam Griffiths-Jones; John H. Doonan; Jae-Hyuk Yu; Kay Vienken; Arnab Pain; Michael Freitag
The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation.
Nature Biotechnology | 2012
Rupali P Patwardhan; Joseph Hiatt; Daniela M. Witten; Mee J. Kim; Robin P. Smith; Dalit May; Choli Lee; Jennifer M. Andrie; Su-In Lee; Gregory M. Cooper; Nadav Ahituv; Len A. Pennacchio; Jay Shendure
The functional consequences of genetic variation in mammalian regulatory elements are poorly understood. We report the in vivo dissection of three mammalian enhancers at single-nucleotide resolution through a massively parallel reporter assay. For each enhancer, we synthesized a library of >100,000 mutant haplotypes with 2–3% divergence from the wild-type sequence. Each haplotype was linked to a unique sequence tag embedded within a transcriptional cassette. We introduced each enhancer library into mouse liver and measured the relative activities of individual haplotypes en masse by sequencing the transcribed tags. Linear regression analysis yielded highly reproducible estimates of the effect of every possible single-nucleotide change on enhancer activity. The functional consequence of most mutations was modest, with ∼22% affecting activity by >1.2-fold and ∼3% by >2-fold. Several, but not all, positions with higher effects showed evidence for purifying selection, or co-localized with known liver-associated transcription factor binding sites, demonstrating the value of empirical high-resolution functional analysis.
Genome Biology | 2003
Su-In Lee; Serafim Batzoglou
We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component. We test the statistical significance of enrichment of gene annotations within clusters. ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.
PLOS Genetics | 2009
Su-In Lee; Aimée M. Dudley; David A. Drubin; Pamela A. Silver; Nevan J. Krogan; Dana Pe'er; Daphne Koller
Genome-wide RNA expression data provide a detailed view of an organisms biological state; hence, a dataset measuring expression variation between genetically diverse individuals (eQTL data) may provide important insights into the genetics of complex traits. However, with data from a relatively small number of individuals, it is difficult to distinguish true causal polymorphisms from the large number of possibilities. The problem is particularly challenging in populations with significant linkage disequilibrium, where traits are often linked to large chromosomal regions containing many genes. Here, we present a novel method, Lirnet, that automatically learns a regulatory potential for each sequence polymorphism, estimating how likely it is to have a significant effect on gene expression. This regulatory potential is defined in terms of “regulatory features”—including the function of the gene and the conservation, type, and position of genetic polymorphisms—that are available for any organism. The extent to which the different features influence the regulatory potential is learned automatically, making Lirnet readily applicable to different datasets, organisms, and feature sets. We apply Lirnet both to the human HapMap eQTL dataset and to a yeast eQTL dataset and provide statistical and biological results demonstrating that Lirnet produces significantly better regulatory programs than other recent approaches. We demonstrate in the yeast data that Lirnet can correctly suggest a specific causal sequence variation within a large, linked chromosomal region. In one example, Lirnet uncovered a novel, experimentally validated connection between Puf3—a sequence-specific RNA binding protein—and P-bodies—cytoplasmic structures that regulate translation and RNA stability—as well as the particular causative polymorphism, a SNP in Mkt1, that induces the variation in the pathway.
Proceedings of the National Academy of Sciences of the United States of America | 2006
Su-In Lee; Dana Pe'er; Aimée M. Dudley; George M. Church; Daphne Koller
Sequence polymorphisms affect gene expression by perturbing the complex network of regulatory interactions. We propose a probabilistic method, called Geronemo, which directly aims to identify the mechanism by which genetic changes perturb the regulatory network. Geronemo automatically constructs a set of coregulated genes (modules), whose regulation can involve both sequence variations and expression of regulators. By exploiting the modularity of genetic regulatory systems, Geronemo reveals regulatory relationships that are indiscernible when genes are considered in isolation, allowing the recovery of intricate combinatorial regulation. By incorporating both expression and genotype of regulators, Geronemo captures cases where the effect of sequence variation on its targets is indirect. We applied Geronemo to a data set from the progeny generated by a cross between laboratory BY4716 (BY) and wild RM11-1a (RM) isolates of Saccharomyces cerevisiae. Geronemo produced previously undescribed hypotheses regarding genetic perturbations in the yeast regulatory network, including transcriptional regulation, signal transduction, and chromatin modification. In particular, we find a large number of modules that have both chromosomal characteristics and are regulated by chromatin modification proteins. Indeed, a large fraction of the variance in the expression can be explained by a small number of markers associated with chromatin modifiers. Additional analysis reveals positive selection for sequence evolution of elements in the Swi/Snf chromatin remodeling complex. Overall, our results suggest that a significant part of individual expression variation in yeast arises from evolution of a small number of chromatin structure modifiers.
Proteins | 2011
Sivaraman Balakrishnan; Hetunandan Kamisetty; Jaime G. Carbonell; Su-In Lee; Christopher James Langmead
We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins. Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA. The resulting model encodes both the position‐specific conservation statistics and the correlated mutation statistics between sequential and long‐range pairs of residues. Existing techniques for learning graphical models from MSA either make strong, and often inappropriate assumptions about the conditional independencies within the MSA (e.g., Hidden Markov Models), or else use suboptimal algorithms to learn the parameters of the model. In contrast, GREMLIN makes no a priori assumptions about the conditional independencies within the MSA. We formulate and solve a convex optimization problem, thus guaranteeing that we find a globally optimal model at convergence. The resulting model is also generative, allowing for the design of new protein sequences that have the same statistical properties as those in the MSA. We perform a detailed analysis of covariation statistics on the extensively studied WW and PDZ domains and show that our method out‐performs an existing algorithm for learning undirected probabilistic graphical models from MSA. We then apply our approach to 71 additional families from the PFAM database and demonstrate that the resulting models significantly out‐perform Hidden Markov Models in terms of predictive accuracy. Proteins 2011;
Cell Reports | 2015
Robert T. Lawrence; Elizabeth M. Perez; Daniel Hernández; Chris P. Miller; Kelsey M. Haas; Hanna Y. Irie; Su-In Lee; Anthony Blau; Judit Villén
Triple-negative breast cancer is a heterogeneous disease characterized by poor clinical outcomes and a shortage of targeted treatment options. To discover molecular features of triple-negative breast cancer, we performed quantitative proteomics analysis of twenty human-derived breast cell lines and four primary breast tumors to a depth of more than 12,000 distinct proteins. We used this data to identify breast cancer subtypes at the protein level and demonstrate the precise quantification of biomarkers, signaling proteins, and biological pathways by mass spectrometry. We integrated proteomics data with exome sequence resources to identify genomic aberrations that affect protein expression. We performed a high-throughput drug screen to identify protein markers of drug sensitivity and understand the mechanisms of drug resistance. The genome and proteome provide complementary information that, when combined, yield a powerful engine for therapeutic discovery. This resource is available to the cancer research community to catalyze further analysis and investigation.
The Journal of Neuroscience | 2011
Iain M. Dykes; Lynne Tempest; Su-In Lee; Eric E. Turner
The combinatorial expression of transcription factors frequently marks cellular identity in the nervous system, yet how these factors interact to determine specific neuronal phenotypes is not well understood. Sensory neurons of the trigeminal ganglion (TG) and dorsal root ganglia (DRG) coexpress the homeodomain transcription factors Brn3a and Islet1, and past work has revealed partially overlapping programs of gene expression downstream of these factors. Here we examine sensory development in Brn3a/Islet1 double knock-out (DKO) mice. Sensory neurogenesis and the formation of the TG and DRG occur in DKO embryos, but the DRG are dorsally displaced, and the peripheral projections of the ganglia are markedly disturbed. Sensory neurons in DKO embryos show a profound loss of all early markers of sensory subtypes, including the Ntrk neurotrophin receptors, and the runt-family transcription factors Runx1 and Runx3. Examination of global gene expression in the E12.5 DRG of single and double mutant embryos shows that Brn3a and Islet1 are together required for nearly all aspects of sensory-specific gene expression, including several newly identified sensory markers. On a majority of targets, Brn3a and Islet1 exhibit negative epistasis, in which the effects of the individual knock-out alleles are less than additive in the DKO. Smaller subsets of targets exhibit positive epistasis, or are regulated exclusively by one factor. Brn3a/Islet1 double mutants also fail to developmentally repress neurogenic bHLH genes, and in vivo chromatin immunoprecipitation shows that Islet1 binds to a known Brn3a-regulated enhancer in the neurod4 gene, suggesting a mechanism of interaction between these genes.
Blood | 2009
Andrew J. Gentles; Ash A. Alizadeh; Su-In Lee; June H. Myklebust; Catherine M. Shachaf; Babak Shahbaba; Ronald Levy; Daphne Koller; Sylvia K. Plevritis
Histologic transformation (HT) of follicular lymphoma to diffuse large B-cell lymphoma (DLBCL-t) is associated with accelerated disease course and drastically worse outcome, yet the underlying mechanisms are poorly understood. We show that a network of gene transcriptional modules underlies HT. Central to the network hierarchy is a signature strikingly enriched for pluripotency-related genes. These genes are typically expressed in embryonic stem cells (ESCs), including MYC and its direct targets. This core ESC-like program was independent of proliferation/cell-cycle and overlapped but was distinct from normal B-cell transcriptional programs. Furthermore, we show that the ESC program is correlated with transcriptional programs maintaining tumor phenotype in transgenic MYC-driven mouse models of lymphoma. Although our approach was to identify HT mechanisms rather than to derive an optimal survival predictor, a model based on ESC/differentiation programs stratified patient outcomes in 2 independent patient cohorts and was predictive of propensity of follicular lymphoma tumors to transform. Transformation was associated with an expression signature combining high expression of ESC transcriptional programs with reduced expression of stromal programs. Together, these findings suggest a central role for an ESC-like signature in the mechanism of HT and provide new clues for potential therapeutic targets.
Genome Biology | 2016
Scott M. Lundberg; William B. Tu; Brian Raught; Linda Z. Penn; Michael M. Hoffman; Su-In Lee
A cell’s epigenome arises from interactions among regulatory factors—transcription factors and histone modifications—co-localized at particular genomic regions. We developed a novel statistical method, ChromNet, to infer a network of these interactions, the chromatin network, by inferring conditional-dependence relationships among a large number of ChIP-seq data sets. We applied ChromNet to all available 1451 ChIP-seq data sets from the ENCODE Project, and showed that ChromNet revealed previously known physical interactions better than alternative approaches. We experimentally validated one of the previously unreported interactions, MYC–HCFC1. An interactive visualization tool is available at http://chromnet.cs.washington.edu.