Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Serafim Batzoglou is active.

Publication


Featured researches published by Serafim Batzoglou.


research in computational molecular biology | 1999

A dictionary based approach for gene annotation

Lior Pachter; Serafim Batzoglou; Valentin I. Spitkovsky; William S. Beebee; Eric S. Lander; Bonnie Berger; Daniel J. Kleitman

This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O (1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction.


research in computational molecular biology | 2000

Human and mouse gene structure: comparative analysis and application to exon prediction

Serafim Batzoglou; Lior Pachter; Jill P. Mesirov; Bonnie Berger; Eric S. Lander

We describe a novel analytical approach to gene recognition based on cross-species comparison We first undertook a comparison of orthologous genomic look from human and mouse, studying the extent of similarity in the number, size and sequence of exons and introns We then developed an approach for recognizing genes within such orthologous regions, by first aligning the regions using an iterative global alignment system and then identifying genes based on conservation of exonic features at aligned positions in both species The alignment and gene recognition are performed by new programs called GLASS and ROSETTA, respectively ROSETTA performed well at exact identification of coding exons in 117 orthologous pairs tested.


research in computational molecular biology | 1997

Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP model

Richa Agarwala; Serafim Batzoglou; Vlado Dančík; Scott E. Decatur; Martin Farach; Sridhar Hannenhalli; S. Muthukrishnan; Steven Skiena

We consider the problem of determining the three-dimensional folding of a protein given its one-dimensional amino acid sequence. We use the HP model for protein folding proposed by Dill (1985), which models protein as a chain of amino acid residues that are either hydrophobic or polar, and hydrophobic interactions are the dominant initial driving force for the protein folding. Hart and Istrail (1996a) gave approximation algorithms for folding proteins on the cubic lattice under the HP model. In this paper, we examine the choice of a lattice by considering its algorithmic and geometric implications and argue that the triangular lattice is a more reasonable choice. We present a set of folding rules for a triangular lattice and analyze the approximation ratio they achieve. In addition, we introduce a generalization of the HP model to account for residues having different levels of hydrophobicity. After describing the biological foundation for this generalization, we show that in the new model we are able to achieve similar constant factor approximation guarantees on the triangular lattice as were achieved in the standard HP model. While the structures derived from our folding rules are probably still far from biological reality, we hope that having a set of folding rules with different properties will yield more interesting folds when combined.


combinatorial pattern matching | 1999

Physical Mapping with Repeated Probes: The Hypergraph Superstring Problem

Serafim Batzoglou; Sorin Istrail

We focus on the combinatorial analysis of physical mapping with repeated probes. We present computational complexity results, and we describe and analyze an algorithmic strategy. We are following the research avenue proposed by Karp [9] on modeling the problem as a combinatorial problem - the Hypergraph Superstring Problem - intimately related to the Lander-Waterman stochastic model [16]. We show that a sparse version of the problem is MAXSNP-complete, a result that carries over to the general case. We show that the minimum Sperner decomposition of a set collection, a problem that is related to the Hypergraph Superstring problem, is NP-complete. Finally we show that the Generalized Hypergraph Superstring Problem is also MAXSNP-hard.We present an efficient algorithm for retrieving the PQ-tree of optimal zero repetition solutions, that provides a constant approximation to the optimal solution on sparse data. We provide experimental results on simulated data.


research in computational molecular biology | 2000

Sequencing a genome by walking with clone-end sequences (abstract): a mathematical analysis

Serafim Batzoglou; Bonnie Berger; Jill P. Mesirov; Eric S. Lander

One important approach to sequencing a large genome is (i) to sequence a collection of non-overlapping `seed chosen from a genomic library of large-insert clones (such as bacterial artificial chromosome (BACs)) and then (ii) to take successive `walking steps by selecting and sequencing minimally overlapping clones, using information such as clone-end sequences to identify the overlaps. We analyze the strategic issues involved in using this approach. We derive formulas showing how two key factors, the initial density of seed clones and the depth of the genomic library used for walking, affect the cost and time of a sequencing project—that is, the amount of redundant sequencing and the number of steps to cover the vast majority of the genome. We also discuss a variant strategy in which a second genomic library with clones having a somewhat smaller insert size is used to close gaps. This approach can dramatically decrease the amount of redundant sequencing, without affecting the rate at which the genome is covered.


Archive | 1996

Protein Folding in the Hydrophobic-Polar Model on the 3D Triangular Lattice

Scott E. Decatur; Serafim Batzoglou


Archive | 2000

Computational genomics: mapping, comparison, and annotation of genomes

Serafim Batzoglou; Bonnie Berger


Archive | 2002

Methods for assembly of genetic information

Serafim Batzoglou; Bonnie Berger; Jill P. Mesirov; Eric S. Lander


Archive | 2000

Compara-tive analysis of mouse and human DNA and applications to exon prediction

Serafim Batzoglou; Lior Pachter; Jill P. Mesirov; Bonnie Berger; Eric S. Lander


Archive | 2000

Prediction of Self-Assembly of Energetic Tiles and Dominos: Experiments, Mathematics and Software

Sorin Istrail; Alan J. Hurd; Ross A. Lippert; Brian Walenz; Serafim Batzoglou; John H. Conway; Freddie W. Peyerl

Collaboration


Dive into the Serafim Batzoglou's collaboration.

Top Co-Authors

Avatar

Bonnie Berger

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lior Pachter

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Brian Walenz

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar

Daniel J. Kleitman

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richa Agarwala

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge