David Burstein
Tel Aviv University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Burstein.
PLOS Pathogens | 2009
David Burstein; Tal Zusman; Elena Degtyar; Ram Viner; Gil Segal; Tal Pupko
A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF) was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine learning algorithms for the identification and characterization of bacterial pathogenesis determinants.
Journal of Computational Biology | 2006
Igor Ulitsky; David Burstein; Tamir Tuller; Benny Chor
We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two l long sequences can be calculated in O(l) time. We implemented the algorithm using suffix arrays our implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with acceptable phylogenetic and taxonomic truth. To assess our approach, our results were compared to the traditional (single-gene or protein-based) maximum likelihood method. The obtained trees were compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a traditional trees and a standard tree comparison method, our algorithm improved upon the competition by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.
Cell Reports | 2012
Maayan Amit; Maya Donyo; Dror Hollander; Amir Goren; Eddo Kim; Sahar Gelfman; Galit Lev-Maor; David Burstein; Schraga Schwartz; Benny Postolsky; Tal Pupko; Gil Ast
During evolution segments of homeothermic genomes underwent a GC content increase. Our analyses reveal that two exon-intron architectures have evolved from an ancestral state of low GC content exons flanked by short introns with a lower GC content. One group underwent a GC content elevation that abolished the differential exon-intron GC content, with introns remaining short. The other group retained the overall low GC content as well as the differential exon-intron GC content, and is associated with longer introns. We show that differential exon-intron GC content regulates exon inclusion level in this group, in which disease-associated mutations often lead to exon skipping. This groups exons also display higher nucleosome occupancy compared to flanking introns and exons of the other group, thus marking them for spliceosomal recognition. Collectively, our results reveal that differential exon-intron GC content is a previously unidentified determinant of exon selection and argue that the two GC content architectures reflect the two mechanisms by which splicing signals are recognized: exon definition and intron definition.
Proceedings of the National Academy of Sciences of the United States of America | 2013
Ziv Lifshitz; David Burstein; Michael Peeri; Tal Zusman; Kierstyn T. Schwartz; Howard A. Shuman; Tal Pupko; Gil Segal
Legionella and Coxiella are intracellular pathogens that use the virulence-related Icm/Dot type-IVB secretion system to translocate effector proteins into host cells during infection. These effectors were previously shown to contain a C-terminal secretion signal required for their translocation. In this research, we implemented a hidden semi-Markov model to characterize the amino acid composition of the signal, thus providing a comprehensive computational model for the secretion signal. This model accounts for dependencies among sites and captures spatial variation in amino acid composition along the secretion signal. To validate our model, we predicted and synthetically constructed an optimal secretion signal whose sequence is different from that of any known effector. We show that this signal efficiently translocates into host cells in an Icm/Dot-dependent manner. Additionally, we predicted in silico and experimentally examined the effects of mutations in the secretion signal, which provided innovative insights into its characteristics. Some effectors were found to lack a strong secretion signal according to our model. We demonstrated that these effectors were highly dependent on the IcmS-IcmW chaperons for their translocation, unlike effectors that harbor a strong secretion signal. Furthermore, our model is innovative because it enables searching ORFs for secretion signals on a genomic scale, which led to the identification and experimental validation of 20 effectors from Legionella pneumophila, Legionella longbeachae, and Coxiella burnetii. Our combined computational and experimental methodology is general and can be applied to the identification of a wide spectrum of protein features that lack sequence conservation but have similar amino acid characteristics.
Nature Genetics | 2016
David Burstein; Francisco Amaro; Tal Zusman; Ziv Lifshitz; Ofir Cohen; Jack A. Gilbert; Tal Pupko; Howard A. Shuman; Gil Segal
Infection by the human pathogen Legionella pneumophila relies on the translocation of ∼300 virulence proteins, termed effectors, which manipulate host cell processes. However, almost no information exists regarding effectors in other Legionella pathogens. Here we sequenced, assembled and characterized the genomes of 38 Legionella species and predicted their effector repertoires using a previously validated machine learning approach. This analysis identified 5,885 predicted effectors. The effector repertoires of different Legionella species were found to be largely non-overlapping, and only seven core effectors were shared by all species studied. Species-specific effectors had atypically low GC content, suggesting exogenous acquisition, possibly from the natural protozoan hosts of these species. Furthermore, we detected numerous new conserved effector domains and discovered new domain combinations, which allowed the inference of as yet undescribed effector functions. The effector collection and network of domain architectures described here can serve as a roadmap for future studies of effector function and evolution.
Genome Research | 2012
Sahar Gelfman; David Burstein; Osnat Penn; Anna Savchenko; Maayan Amit; Schraga Schwartz; Tal Pupko; Gil Ast
Exon-intron architecture is one of the major features directing the splicing machinery to the short exons that are located within long flanking introns. However, the evolutionary dynamics of exon-intron architecture and its impact on splicing is largely unknown. Using a comparative genomic approach, we analyzed 17 vertebrate genomes and reconstructed the ancestral motifs of both 3 and 5 splice sites, as also the ancestral length of exons and introns. Our analyses suggest that vertebrate introns increased in length from the shortest ancestral introns to the longest primate introns. An evolutionary analysis of splice sites revealed that weak splice sites act as a restrictive force keeping introns short. In contrast, strong splice sites allow recognition of exons flanked by long introns. Reconstruction of the ancestral state suggests these phenomena were not prevalent in the vertebrate ancestor, but appeared during vertebrate evolution. By calculating evolutionary rate shifts in exons, we identified cis-acting regulatory sequences that became fixed during the transition from early vertebrates to mammals. Experimental validations performed on a selection of these hexamers confirmed their regulatory function. We additionally revealed many features of exons that can discriminate alternative from constitutive exons. These features were integrated into a machine-learning approach to predict whether an exon is alternative. Our algorithm obtains very high predictive power (AUC of 0.91), and using these predictions we have identified and successfully validated novel alternatively spliced exons. Overall, we provide novel insights regarding the evolutionary constraints acting upon exons and their recognition by the splicing machinery.
Proceedings of the National Academy of Sciences of the United States of America | 2013
Ido Yosef; Dror Shitrit; Moran G. Goren; David Burstein; Tal Pupko; Udi Qimron
Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated proteins constitute a recently identified prokaryotic defense system against invading nucleic acids. DNA segments, termed protospacers, are integrated into the CRISPR array in a process called adaptation. Here, we establish a PCR-based assay that enables evaluating the adaptation efficiency of specific spacers into the type I-E Escherichia coli CRISPR array. Using this assay, we provide direct evidence that the protospacer adjacent motif along with the first base of the protospacer (5′-AAG) partially affect the efficiency of spacer acquisition. Remarkably, we identified a unique dinucleotide, 5′-AA, positioned at the 3′ end of the spacer, that enhances efficiency of the spacers acquisition. Insertion of this dinucleotide increased acquisition efficiency of two different spacers. DNA sequencing of newly adapted CRISPR arrays revealed that the position of the newly identified motif with respect to the 5′-AAG is important for affecting acquisition efficiency. Analysis of approximately 1 million spacers showed that this motif is overrepresented in frequently acquired spacers compared with those acquired rarely. Our results represent an example of a short nonprotospacer adjacent motif sequence that affects acquisition efficiency and suggest that other as yet unknown motifs affect acquisition efficiency in other CRISPR systems as well.
Nucleic Acids Research | 2011
Adi Barzel; Eyal Privman; Michael Peeri; Adit Naor; Einat Shachar; David Burstein; Rona Lazary; Uri Gophna; Tal Pupko; Martin Kupiec
In recent years, both homing endonucleases (HEases) and zinc-finger nucleases (ZFNs) have been engineered and selected for the targeting of desired human loci for gene therapy. However, enzyme engineering is lengthy and expensive and the off-target effect of the manufactured endonucleases is difficult to predict. Moreover, enzymes selected to cleave a human DNA locus may not cleave the homologous locus in the genome of animal models because of sequence divergence, thus hampering attempts to assess the in vivo efficacy and safety of any engineered enzyme prior to its application in human trials. Here, we show that naturally occurring HEases can be found, that cleave desirable human targets. Some of these enzymes are also shown to cleave the homologous sequence in the genome of animal models. In addition, the distribution of off-target effects may be more predictable for native HEases. Based on our experimental observations, we present the HomeBase algorithm, database and web server that allow a high-throughput computational search and assignment of HEases for the targeting of specific loci in the human and other genomes. We validate experimentally the predicted target specificity of candidate fungal, bacterial and archaeal HEases using cell free, yeast and archaeal assays.
Infection and Immunity | 2014
Ziv Lifshitz; David Burstein; Kierstyn T. Schwartz; Howard A. Shuman; Tal Pupko; Gil Segal
ABSTRACT Coxiella burnetii, the causative agent of Q fever, is a human intracellular pathogen that utilizes the Icm/Dot type IVB secretion system to translocate effector proteins into host cells. To identify novel C. burnetii effectors, we applied a machine-learning approach to predict C. burnetii effectors, and examination of 20 such proteins resulted in the identification of 13 novel effectors. To determine whether these effectors, as well as several previously identified effectors, modulate conserved eukaryotic pathways, they were expressed in Saccharomyces cerevisiae. The effects on yeast growth were examined under regular growth conditions and in the presence of caffeine, a known modulator of the yeast cell wall integrity (CWI) mitogen-activated protein (MAP) kinase pathway. In the presence of caffeine, expression of the effectors CBU0885 and CBU1676 caused an enhanced inhibition of yeast growth, and the growth inhibition of CBU0388 was suppressed. Furthermore, analysis of synthetic lethality effects and examination of the activity of the CWI MAP kinase transcription factor Rlm1 indicated that CBU0388 enhances the activation of this MAP kinase pathway in yeast, while CBU0885 and CBU1676 abolish this activation. Additionally, coexpression of CBU1676 and CBU0388 resulted in mutual suppression of their inhibition of yeast growth. These results strongly indicate that these three effectors modulate the CWI MAP kinase pathway in yeast. Moreover, both CBU1676 and CBU0885 were found to contain a conserved haloacid dehalogenase (HAD) domain, which was found to be required for their activity. Collectively, our results demonstrate that MAP kinase pathways are most likely targeted by C. burnetii Icm/Dot effectors.
Bioinformatics | 2012
Ofir Cohen; Haim Ashkenazy; David Burstein; Tal Pupko
Motivation: Correlated events of gains and losses enable inference of co-evolution relations. The reconstruction of the co-evolutionary interactions network in prokaryotic species may elucidate functional associations among genes. Results: We developed a novel probabilistic methodology for the detection of co-evolutionary interactions between pairs of genes. Using this method we inferred the co-evolutionary network among 4593 Clusters of Orthologous Genes (COGs). The number of co-evolutionary interactions substantially differed among COGs. Over 40% were found to co-evolve with at least one partner. We partitioned the network of co-evolutionary relations into clusters and uncovered multiple modular assemblies of genes with clearly defined functions. Finally, we measured the extent to which co-evolutionary relations coincide with other cellular relations such as genomic proximity, gene fusion propensity, co-expression, protein–protein interactions and metabolic connections. Our results show that co-evolutionary relations only partially overlap with these other types of networks. Our results suggest that the inferred co-evolutionary network in prokaryotes is highly informative towards revealing functional relations among genes, often showing signals that cannot be extracted from other network types. Availability and implementation: Available under GPL license as open source. Contact: [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online.