Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yoseph Barash is active.

Publication


Featured researches published by Yoseph Barash.


Nature | 2010

Deciphering the splicing code

Yoseph Barash; John A. Calarco; Weijun Gao; Qun Pan; Xinchen Wang; Ofer Shai; Benjamin J. Blencowe; Brendan J. Frey

Alternative splicing has a crucial role in the generation of biological complexity, and its misregulation is often involved in human disease. Here we describe the assembly of a ‘splicing code’, which uses combinations of hundreds of RNA features to predict tissue-dependent changes in alternative splicing for thousands of exons. The code determines new classes of splicing patterns, identifies distinct regulatory programs in different tissues, and identifies mutation-verified regulatory sequences. Widespread regulatory strategies are revealed, including the use of unexpectedly large combinations of features, the establishment of low exon inclusion levels that are overcome by features in specific tissues, the appearance of features deeper into introns than previously appreciated, and the modulation of splice variant levels by transcript structure characteristics. The code detected a class of exons whose inclusion silences expression in adult tissues by activating nonsense-mediated messenger RNA decay, but whose exclusion promotes expression during embryogenesis. The code facilitates the discovery and detailed characterization of regulated alternative splicing events on a genome-wide scale.


Science | 2015

The human splicing code reveals new insights into the genetic determinants of disease

Hui Y. Xiong; Babak Alipanahi; Leo J. Lee; Hannes Bretschneider; Daniele Merico; Ryan K. C. Yuen; Yimin Hua; Serge Gueroussov; Hamed Shateri Najafabadi; Timothy R. Hughes; Quaid Morris; Yoseph Barash; Adrian R. Krainer; Nebojsa Jojic; Stephen W. Scherer; Benjamin J. Blencowe; Brendan J. Frey

Predicting defects in RNA splicing Most eukaryotic messenger RNAs (mRNAs) are spliced to remove introns. Splicing generates uninterrupted open reading frames that can be translated into proteins. Splicing is often highly regulated, generating alternative spliced forms that code for variant proteins in different tissues. RNA-binding proteins that bind specific sequences in the mRNA regulate splicing. Xiong et al. develop a computational model that predicts splicing regulation for any mRNA sequence (see the Perspective by Guigó and Valcárcel). They use this to analyze more than half a million mRNA splicing sequence variants in the human genome. They are able to identify thousands of known disease-causing mutations, as well as many new disease candidates, including 17 new autism-linked genes. Science, this issue 10.1126/science.1254806; see also p. 124 A model predicts how thousands of disease-linked nucleotide variants affect messenger RNA splicing. [Also see Perspective by Guigó and Valcárcel] INTRODUCTION Advancing whole-genome precision medicine requires understanding how gene expression is altered by genetic variants, especially those that are far outside of protein-coding regions. We developed a computational technique that scores how strongly genetic variants affect RNA splicing, a critical step in gene expression whose disruption contributes to many diseases, including cancers and neurological disorders. A genome-wide analysis reveals tens of thousands of variants that alter splicing and are enriched with a wide range of known diseases. Our results provide insight into the genetic basis of spinal muscular atrophy, hereditary nonpolyposis colorectal cancer, and autism spectrum disorder. RATIONALE We used “deep learning” computer algorithms to derive a computational model that takes as input DNA sequences and applies general rules to predict splicing in human tissues. Given a test variant, which may be up to 300 nucleotides into an intron, our model can be used to compute a score for how much the variant alters splicing. The model is not biased by existing disease annotations or population data and was derived in such a way that it can be used to study diverse diseases and disorders and to determine the consequences of common, rare, and even spontaneous variants. RESULTS Our technique is able to accurately classify disease-causing variants and provides insights into the role of aberrant splicing in disease. We scored more than 650,000 DNA variants and found that disease-causing variants have higher scores than common variants and even those associated with disease in genome-wide association studies (GWAS). Our model predicts substantial and unexpected aberrant splicing due to variants within introns and exons, including those far from the splice site. For example, among intronic variants that are more than 30 nucleotides away from any splice site, known disease variants alter splicing nine times as often as common variants; among missense exonic disease variants, those that least affect protein function are more than five times as likely as other variants to alter splicing. Autism has been associated with disrupted splicing in brain regions, so we used our method to score variants detected using whole-genome sequencing data from individuals with and without autism. Genes with high-scoring variants include many that have previously been linked with autism, as well as new genes with known neurodevelopmental phenotypes. Most of the high-scoring variants are intronic and cannot be detected by exome analysis techniques. When we scored clinical variants in spinal muscular atrophy and colorectal cancer genes, up to 94% of variants found to alter splicing using minigene reporters were correctly classified. CONCLUSION In the context of precision medicine, causal support for variants independent of existing whole-genome variant studies is greatly needed. Our computational model was trained to predict splicing from DNA sequence alone, without using disease annotations or population data. Consequently, its predictions are independent of and complementary to population data, GWAS, expression-based quantitative trait loci (QTL), and functional annotations of the genome. As such, our technique greatly expands the opportunities for understanding the genetic determinants of disease. “Deep learning” reveals the genetic origins of disease. A computational system mimics the biology of RNA splicing by correlating DNA elements with splicing levels in healthy human tissues. The system can scan DNA and identify damaging genetic variants, including those deep within introns. This procedure has led to insights into the genetics of autism, cancers, and spinal muscular atrophy. To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.


Cancer Discovery | 2015

Convergence of Acquired Mutations and Alternative Splicing of CD19 Enables Resistance to CART-19 Immunotherapy

Elena Sotillo; David M. Barrett; Kathryn L. Black; Asen Bagashev; Derek A. Oldridge; Glendon Wu; Robyn T. Sussman; Claudia Lanauze; Marco Ruella; Matthew R. Gazzara; Nicole M. Martinez; Colleen T. Harrington; Elaine Y. Chung; Jessica Perazzelli; Ted J. Hofmann; Shannon L. Maude; Pichai Raman; Alejandro Barrera; Saar Gill; Simon F. Lacey; J. Joseph Melenhorst; David Allman; Elad Jacoby; Terry J. Fry; Crystal L. Mackall; Yoseph Barash; Kristen W. Lynch; John M. Maris; Stephan A. Grupp; Andrei Thomas-Tikhonenko

UNLABELLED The CD19 antigen, expressed on most B-cell acute lymphoblastic leukemias (B-ALL), can be targeted with chimeric antigen receptor-armed T cells (CART-19), but relapses with epitope loss occur in 10% to 20% of pediatric responders. We detected hemizygous deletions spanning the CD19 locus and de novo frameshift and missense mutations in exon 2 of CD19 in some relapse samples. However, we also discovered alternatively spliced CD19 mRNA species, including one lacking exon 2. Pull-down/siRNA experiments identified SRSF3 as a splicing factor involved in exon 2 retention, and its levels were lower in relapsed B-ALL. Using genome editing, we demonstrated that exon 2 skipping bypasses exon 2 mutations in B-ALL cells and allows expression of the N-terminally truncated CD19 variant, which fails to trigger killing by CART-19 but partly rescues defects associated with CD19 loss. Thus, this mechanism of resistance is based on a combination of deleterious mutations and ensuing selection for alternatively spliced RNA isoforms. SIGNIFICANCE CART-19 yield 70% response rates in patients with B-ALL, but also produce escape variants. We discovered that the underlying mechanism is the selection for preexisting alternatively spliced CD19 isoforms with the compromised CART-19 epitope. This mechanism suggests a possibility of targeting alternative CD19 ectodomains, which could improve survival of patients with B-cell neoplasms.


research in computational molecular biology | 2003

Modeling dependencies in protein-DNA binding sites

Yoseph Barash; Nir Friedman; Tommy Kaplan

The availability of whole genome sequences and high-throughput genomic assays opens the door for in silico analysis of transcription regulation. This includes methods for discovering and characterizing the binding sites of DNA-binding proteins, such as transcription factors. A common representation of transcription factor binding sites is a position specific score matrix (PSSM). This representation makes the strong assumption that binding site positions are independent of each other. In this work, we explore Bayesian network representations of binding sites that provide different tradeoffs between complexity (number of parameters) and the richness of dependencies between positions. We develop the formal machinery for learning such models from data and for estimating the statistical significance of putative binding sites. We then evaluate the ramifications of these richer representations in characterizing binding site motifs and predicting their genomic locations. We show that these richer representations improve over the PSSM model in both tasks.


research in computational molecular biology | 2001

Context-specific Bayesian clustering for gene expression data

Yoseph Barash; Nir Friedman

The recent growth in genomic data and measurement of genome-wide expression patterns allows to examine gene regulation by transcription factors using computational tools. In this work, we present a class of mathematical models that help in understanding the connections between transcription factors and functional classes of genes based on genetic and genomic data. These models represent the joint distribution of transcription factor binding sites and of expression levels of a gene in a single model. Learning a combined probability model of binding sites and expression patterns enables us to improve the clustering of the genes based on the discovery of putative binding sites and to detect which binding sites and experiments best characterize a cluster. To learn such models from data, we introduce a new search method that rapidly learns a model according to a Bayesian score. We evaluate our method on synthetic data as well as on real data and analyze the biological insights it provides.


Genome Biology | 2007

Functional coordination of alternative splicing in the mammalian central nervous system

Matthew M. Fagnani; Yoseph Barash; Joanna Y. Ip; Christine M. Misquitta; Qun Pan; Arneet L. Saltzman; Ofer Shai; Leo J. Lee; Aviad Rozenhek; Naveed Mohammad; Sandrine Willaime-Morawek; Tomas Babak; Wen Zhang; Timothy R. Hughes; Derek van der Kooy; Brendan J. Frey; Benjamin J. Blencowe

BackgroundAlternative splicing (AS) functions to expand proteomic complexity and plays numerous important roles in gene regulation. However, the extent to which AS coordinates functions in a cell and tissue type specific manner is not known. Moreover, the sequence code that underlies cell and tissue type specific regulation of AS is poorly understood.ResultsUsing quantitative AS microarray profiling, we have identified a large number of widely expressed mouse genes that contain single or coordinated pairs of alternative exons that are spliced in a tissue regulated fashion. The majority of these AS events display differential regulation in central nervous system (CNS) tissues. Approximately half of the corresponding genes have neural specific functions and operate in common processes and interconnected pathways. Differential regulation of AS in the CNS tissues correlates strongly with a set of mostly new motifs that are predominantly located in the intron and constitutive exon sequences neighboring CNS-regulated alternative exons. Different subsets of these motifs are correlated with either increased inclusion or increased exclusion of alternative exons in CNS tissues, relative to the other profiled tissues.ConclusionOur findings provide new evidence that specific cellular processes in the mammalian CNS are coordinated at the level of AS, and that a complex splicing code underlies CNS specific AS regulation. This code appears to comprise many new motifs, some of which are located in the constitutive exons neighboring regulated alternative exons. These data provide a basis for understanding the molecular mechanisms by which the tissue specific functions of widely expressed genes are coordinated at the level of AS.


workshop on algorithms in bioinformatics | 2001

A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites

Yoseph Barash; Gill Bejerano; Nir Friedman

A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent flood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms. One important consequence is the ability to recognize groups of genes that are co-expressed using microarray expression data. We then wish to identify in-silico putative transcription factor binding sites in the promoter regions of these gene, that might explain the coregulation, and hint at possible regulators. In this paper we describe a simple and fast, yet powerful, two stages approach to this task. Using a rigorous hypergeometric statistical analysis and a straightforward computational procedure we find small conserved sequence kernels. These are then stochastically expanded into PSSMs using an EM-like procedure. We demonstrate the utility and speed of our methods by applying them to several data sets from recent literature. We also compare these results with those of MEME when run on the same sets.


Journal of Computational Biology | 2002

Context-specific Bayesian clustering for gene expression data.

Yoseph Barash; Nir Friedman

The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors. In this work, we present a class of mathematical models that help in understanding the connections between transcription factors and functional classes of genes based on genetic and genomic data. Such a model represents the joint distribution of transcription factor binding sites and of expression levels of a gene in a unified probabilistic model. Learning a combined probability model of binding sites and expression patterns enables us to improve the clustering of the genes based on the discovery of putative binding sites and to detect which binding sites and experiments best characterize a cluster. To learn such models from data, we introduce a new search method that rapidly learns a model according to a Bayesian score. We evaluate our method on synthetic data as well as on real life data and analyze the biological insights it provides. Finally, we demonstrate the applicability of the method to other data analysis problems in gene expression data.


eLife | 2016

A new view of transcriptome complexity and regulation through the lens of local splicing variations

Jorge Vaquero-Garcia; Alejandro Barrera; Matthew R. Gazzara; Juan González-Vallinas; Nicholas F. Lahens; John B. Hogenesch; Kristen W. Lynch; Yoseph Barash

Alternative splicing (AS) can critically affect gene function and disease, yet mapping splicing variations remains a challenge. Here, we propose a new approach to define and quantify mRNA splicing in units of local splicing variations (LSVs). LSVs capture previously defined types of alternative splicing as well as more complex transcript variations. Building the first genome wide map of LSVs from twelve mouse tissues, we find complex LSVs constitute over 30% of tissue dependent transcript variations and affect specific protein families. We show the prevalence of complex LSVs is conserved in humans and identify hundreds of LSVs that are specific to brain subregions or altered in Alzheimers patients. Amongst those are novel isoforms in the Camk2 family and a novel poison exon in Ptbp1, a key splice factor in neurogenesis. We anticipate the approach presented here will advance the ability to relate tissue-specific splice variation to genetic variation, phenotype, and disease. DOI: http://dx.doi.org/10.7554/eLife.11752.001


Genome Biology | 2013

AVISPA: a web tool for the prediction and analysis of alternative splicing

Yoseph Barash; Jorge Vaquero-Garcia; Juan González-Vallinas; Hui Yuan Xiong; Weijun Gao; Leo J. Lee; Brendan J. Frey

Transcriptome complexity and its relation to numerous diseases underpins the need to predict in silico splice variants and the regulatory elements that affect them. Building upon our recently described splicing code, we developed AVISPA, a Galaxy-based web tool for splicing prediction and analysis. Given an exon and its proximal sequence, the tool predicts whether the exon is alternatively spliced, displays tissue-dependent splicing patterns, and whether it has associated regulatory elements. We assess AVISPAs accuracy on an independent dataset of tissue-dependent exons, and illustrate how the tool can be applied to analyze a gene of interest. AVISPA is available at http://avispa.biociphers.org.

Collaboration


Dive into the Yoseph Barash's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nir Friedman

Hebrew University of Jerusalem

View shared research outputs
Top Co-Authors

Avatar

Kristen W. Lynch

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alejandro Barrera

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Andrei Thomas-Tikhonenko

Children's Hospital of Philadelphia

View shared research outputs
Top Co-Authors

Avatar

Kathryn L. Black

Children's Hospital of Philadelphia

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge