Saurabh Sinha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saurabh Sinha is active.

Explore More

Publication

Featured researches published by Saurabh Sinha.

Nature Biotechnology | 2005

Assessing computational tools for the discovery of transcription factor binding sites

Martin Tompa; Nan Li; Timothy L. Bailey; George M. Church; Bart De Moor; Eleazar Eskin; Alexander V. Favorov; Martin C. Frith; Yutao Fu; W. James Kent; Vsevolod J. Makeev; Andrei A. Mironov; William Stafford Noble; Giulio Pavesi; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher T. Workman; Chun Ye; Zhou Zhu

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

BMC Bioinformatics | 2004

PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences

Saurabh Sinha; Mathieu Blanchette; Martin Tompa

BackgroundThis paper addresses the problem of discovering transcription factor binding sites in heterogeneous sequence data, which includes regulatory sequences of one or more genes, as well as their orthologs in other species.ResultsWe propose an algorithm that integrates two important aspects of a motifs significance – overrepresentation and cross-species conservation – into one probabilistic score. The algorithm allows the input orthologous sequences to be related by any user-specified phylogenetic tree. It is based on the Expectation-Maximization technique, and scales well with the number of species and the length of input sequences. We evaluate the algorithm on synthetic data, and also present results for data sets from yeast, fly, and human.ConclusionsThe results demonstrate that the new approach improves motif discovery by exploiting multiple species information.

information hiding | 2001

A Graph Theoretic Approach to Software Watermarking

Ramarathnam Venkatesan; Vijay V. Vazirani; Saurabh Sinha

We present a graph theoretic approach for watermarking software in a robust fashion. While watermarking software that are small in size (e.g. a few kilobytes) may be infeasible through this approach, it seems to be a viable scheme for large applications. Our approach works with control/data flow graphs and uses abstractions, approximate k-partitions, and a random walk method to embed the watermark, with the goal of minimizing and controlling the additions to be made for embedding, while keeping the estimated effort to undo the watermark (WM) as high as possible. The watermarks are so embedded that small changes to the software or flow graph are unlikely to disable detection by a probabilistic algorithm that has a secret. This is done by using some relatively robust graph properties and error correcting codes.Under some natural assumptions about the code added to embed the WM, locating the WM by an attacker is related to some graph approximation problems. Since little theoretical foundation exists for hardness of typical instances of graph approximation problems, we present heuristics to generate such hard instances and, in a limited case, present a heuristic analysis of how hard it is to separate the WM in an information theoretic model. We describe some related experimental work. The approach and methods described here also suitable for solving the problem of software tamper resistance.

information hiding | 2002

Oblivious Hashing: A Stealthy Software Integrity Verification Primitive

Yuqun Chen; Ramwarathnam Venkatesan; Matthew Cary; Ruoming Pang; Saurabh Sinha; Mariusz H. Jakubowski

We describe a novel software verification primitive called Oblivious Hashing. Unlike previous techniques that mainly verify the static shape of code, this primitive allows implicit computation of a hash value based on the actual execution (i.e., space-time history of computation) of the code. We also discuss its applications in local software tamper resistance and remote code authentication.

BMC Bioinformatics | 2004

Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila.

Saurabh Sinha; Mark Schroeder; Ulrich Unnerstall; Ulrike Gaul; Eric D. Siggia

BackgroundThe discovery of cis-regulatory modules in metazoan genomes is crucial for understanding the connection between genes and organism diversity. It is important to quantify how comparative genomics can improve computational detection of such modules.ResultsWe run the Stubb software on the entire D. melanogaster genome, to obtain predictions of modules involved in segmentation of the embryo. Stubb uses a probabilistic model to score sequences for clustering of transcription factor binding sites, and can exploit multiple species data within the same probabilistic framework. The predictions are evaluated using publicly available gene expression data for thousands of genes, after careful manual annotation. We demonstrate that the use of a second genome (D. pseudoobscura) for cross-species comparison significantly improves the prediction accuracy of Stubb, and is a more sensitive approach than intersecting the results of separate runs over the two genomes. The entire list of predictions is made available online.ConclusionEvolutionary conservation of modules serves as a filter to improve their detection in silico. The future availability of additional fruitfly genomes therefore carries the prospect of highly specific genome-wide predictions using Stubb.

research in computational molecular biology | 2002

Discriminative motifs

Saurabh Sinha

This paper takes a new view of motif discovery, addressing a common problem in existing motif finders. A motif is treated as a feature of the input promoter regions that leads to a good classifier between these promoters and a set of background promoters. This perspective allows us to adapt existing methods of feature selection, a well studied topic in machine learning, to motif discovery. We develop a general algorithmic framework that can be specialized to work with a wide variety of motif models, including consensus models with degenerate symbols or mismatches, and composite motifs. A key feature of our algorithm is that it measures over-representation while maintaining information about the distribution of motif instances in individual promoters. The assessment of a motifs discriminative power is normalized against chance behaviour by a probabilistic analysis. We apply our framework to two popular motif models, and are able to detect several known binding sites in sets of co-regulated genes in yeast.

pacific symposium on biocomputing | 2003

Motif discovery in heterogeneous sequence data.

Amol Prakash; Mathieu Blanchette; Saurabh Sinha; Martin Tompa

This paper introduces the first integrated algorithm designed to discover novel motifs in heterogeneous sequence data, which is comprised of coregulated genes from a single genome together with the orthologs of these genes from other genomes. Results are presented for regulons in yeasts, worms, and mammals.

bioinformatics and bioengineering | 2003

Performance comparison of algorithms for finding transcription factor binding sites

Saurabh Sinha; Martin Tompa

We compare the accuracy of three motif-finding algorithms for the discovery of novel transcription factor binding sites among co-regulated genes. One of the algorithms (YMF) uses a motif model tailored for binding sites and an enumerative search of the motif space, while the other two (MEME and AlignACE) use a more general motif model and local search techniques. The comparison is done on synthetic data with planted motifs, as well as on real data sets of co-regulated genes from the yeast S. cerevisiae. More often than not, the enumerative algorithm is found to be more accurate than the other two on the yeast data sets, though there is a noticeable exclusivity in the accuracy of the different algorithms. The experiments on synthetic data reveal, not surprisingly, that each algorithm outperforms the others when motifs are planted according to its motif model.

Nucleic Acids Research | 2002