Serafim Batzoglou
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Serafim Batzoglou.
research in computational molecular biology | 1999
Lior Pachter; Serafim Batzoglou; Valentin I. Spitkovsky; William S. Beebee; Eric S. Lander; Bonnie Berger; Daniel J. Kleitman
This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O (1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.
research in computational molecular biology | 2000
Serafim Batzoglou; Lior Pachter; Jill P. Mesirov; Bonnie Berger; Eric S. Lander
We describe a novel analytical approach to gene recognition based on cross-species comparison We first undertook a comparison of orthologous genomic look from human and mouse, studying the extent of similarity in the number, size and sequence of exons and introns We then developed an approach for recognizing genes within such orthologous regions, by first aligning the regions using an iterative global alignment system and then identifying genes based on conservation of exonic features at aligned positions in both species The alignment and gene recognition are performed by new programs called GLASS and ROSETTA, respectively ROSETTA performed well at exact identification of coding exons in 117 orthologous pairs tested.
research in computational molecular biology | 1997
Richa Agarwala; Serafim Batzoglou; Vlado Dančík; Scott E. Decatur; Martin Farach; Sridhar Hannenhalli; S. Muthukrishnan; Steven Skiena
We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill (1985), which models protein as a chain of amino acid residues that are either hydrophobic or polar, and hydrophobic interactions are the dominant initial driving force for the protein folding. Hart and Istrail (1996a) gave approximation algorithms for folding proteins on the cubic lattice under the HP model. In this paper, we examine the choice of a lattice by considering its algorithmic and geometric implications and argue that the triangular lattice is a more reasonable choice. We present a set of folding rules for a triangular lattice and analyze the approximation ratio they achieve. In addition, we introduce a generalization of the HP model to account for residues having different levels of hydrophobicity. After describing the biological foundation for this generalization, we show that in the new model we are able to achieve similar constant factor approximation guarantees on the triangular lattice as were achieved in the standard HP model. While the structures derived from our folding rules are probably still far from biological reality, we hope that having a set of folding rules with different properties will yield more interesting folds when combined.
combinatorial pattern matching | 1999
Serafim Batzoglou; Sorin Istrail
We focus on the combinatorial analysis of physical mapping with repeated probes. We present computational complexity results, and we describe and analyze an algorithmic strategy. We are following the research avenue proposed by Karp [9] on modeling the problem as a combinatorial problem - the Hypergraph Superstring Problem - intimately related to the Lander-Waterman stochastic model [16]. We show that a sparse version of the problem is MAXSNP-complete, a result that carries over to the general case. We show that the minimum Sperner decomposition of a set collection, a problem that is related to the Hypergraph Superstring problem, is NP-complete. Finally we show that the Generalized Hypergraph Superstring Problem is also MAXSNP-hard.We present an efficient algorithm for retrieving the PQ-tree of optimal zero repetition solutions, that provides a constant approximation to the optimal solution on sparse data. We provide experimental results on simulated data.
research in computational molecular biology | 2000
Serafim Batzoglou; Bonnie Berger; Jill P. Mesirov; Eric S. Lander
One important approach to sequencing a large genome is (i) to sequence a collection of non-overlapping `seed chosen from a genomic library of large-insert clones (such as bacterial artificial chromosome (BACs)) and then (ii) to take successive `walking steps by selecting and sequencing minimally overlapping clones, using information such as clone-end sequences to identify the overlaps. We analyze the strategic issues involved in using this approach. We derive formulas showing how two key factors, the initial density of seed clones and the depth of the genomic library used for walking, affect the cost and time of a sequencing project—that is, the amount of redundant sequencing and the number of steps to cover the vast majority of the genome. We also discuss a variant strategy in which a second genomic library with clones having a somewhat smaller insert size is used to close gaps. This approach can dramatically decrease the amount of redundant sequencing, without affecting the rate at which the genome is covered.
Archive | 1996
Scott E. Decatur; Serafim Batzoglou
Archive | 2000
Serafim Batzoglou; Bonnie Berger
Archive | 2002
Serafim Batzoglou; Bonnie Berger; Jill P. Mesirov; Eric S. Lander
Archive | 2000
Serafim Batzoglou; Lior Pachter; Jill P. Mesirov; Bonnie Berger; Eric S. Lander
Archive | 2000
Sorin Istrail; Alan J. Hurd; Ross A. Lippert; Brian Walenz; Serafim Batzoglou; John H. Conway; Freddie W. Peyerl