Shreepriya Das
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shreepriya Das.
Journal of Applied Physics | 2009
Shreepriya Das; Haris Vikalo; Arjang Hassibi
We study the scaling laws of affinity-based biosensors. In particular, we examine the implications of scaling on the response time, signal-to-noise ratio (SNR), and dynamic range (DR) of biosensor systems. Initially, using stochastic differential methods and particularly Fokker–Planck (FP) equation, we formulate the analyte capturing process and derive its uncertainty by computing the probability distribution function of the captured analytes as a function of time. Subsequently, we examine the effects of scaling on the solution to the FP equation and the signal fluctuation, which demonstrates that scaling down significantly reduces the achievable SNR and DR of biosensors. We argue that these results question the advantages of excessive miniaturization of biosensors, especially the fundamental SNR limitation, which transpire in the micro- and nanoregimes.
BMC Genomics | 2015
Shreepriya Das; Haris Vikalo
BackgroundThe goal of haplotype assembly is to infer haplotypes of an individual from a mixture of sequenced chromosome fragments. Limited lengths of paired-end sequencing reads and inserts render haplotype assembly computationally challenging; in fact, most of the problem formulations are known to be NP-hard. Dimensions (and, therefore, difficulty) of the haplotype assembly problems keep increasing as the sequencing technology advances and the length of reads and inserts grow. The computational challenges are even more pronounced in the case of polyploid haplotypes, whose assembly is considerably more difficult than in the case of diploids. Fast, accurate, and scalable methods for haplotype assembly of diploid and polyploid organisms are needed.ResultsWe develop a novel framework for diploid/polyploid haplotype assembly from high-throughput sequencing data. The method formulates the haplotype assembly problem as a semi-definite program and exploits its special structure – namely, the low rank of the underlying solution – to solve it rapidly and with high accuracy. The developed framework is applicable to both diploid and polyploid species. The code for SDhaP is freely available at https://sourceforge.net/projects/sdhap.ConclusionExtensive benchmarking tests on both real and simulated data show that the proposed algorithms outperform several well-known haplotype assembly methods in terms of either accuracy or speed or both. Useful recommendations for coverages needed to achieve near-optimal solutions are also provided.
Bioinformatics | 2012
Shreepriya Das; Haris Vikalo
MOTIVATION Next-generation DNA sequencing platforms are becoming increasingly cost-effective and capable of providing enormous number of reads in a relatively short time. However, their accuracy and read lengths are still lagging behind those of conventional Sanger sequencing method. Performance of next-generation sequencing platforms is fundamentally limited by various imperfections in the sequencing-by-synthesis and signal acquisition processes. This drives the search for accurate, scalable and computationally tractable base calling algorithms capable of accounting for such imperfections. RESULTS Relying on a statistical model of the sequencing-by-synthesis process and signal acquisition procedure, we develop a computationally efficient base calling method for Illuminas sequencing technology (specifically, Genome Analyzer II platform). Parameters of the model are estimated via a fast unsupervised online learning scheme, which uses the generalized expectation-maximization algorithm and requires only 3 s of running time per tile (on an Intel i7 machine @3.07GHz, single core)-a three orders of magnitude speed-up over existing parametric model-based methods. To minimize the latency between the end of the sequencing run and the generation of the base calling reports, we develop a fast online scalable decoding algorithm, which requires only 9 s/tile and achieves significantly lower error rates than the Illuminas base calling software. Moreover, it is demonstrated that the proposed online parameter estimation scheme efficiently computes tile-dependent parameters, which can thereafter be provided to the base calling algorithm, resulting in significant improvements over previously developed base calling methods for the considered platform in terms of performance, time/complexity and latency. AVAILABILITY A C code implementation of our algorithm can be downloaded from http://www.cerc.utexas.edu/OnlineCall/.
BMC Bioinformatics | 2013
Shreepriya Das; Haris Vikalo
BackgroundNext-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in the underlying biochemical and signal acquisition procedures. To this end, various techniques, including statistical methods, are used to improve read lengths and accuracy of these systems. Development of high performing base calling algorithms that are computationally efficient and scalable is an ongoing challenge.ResultsWe develop model-based statistical methods for fast and accurate base calling in Illumina’s next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model which enables dynamic programming formulation of the base calling problem. Forward-backward and soft-output Viterbi algorithms are developed, and their performance and complexity are investigated and compared with the existing state-of-the-art base calling methods for this platform. A C code implementation of our algorithm named Softy can be downloaded from https://sourceforge.net/projects/dynamicprog.ConclusionWe demonstrate high accuracy and speed of the proposed methods on reads obtained using Illumina’s Genome Analyzer II and HiSeq2000. In addition to performing reliable and fast base calling, the developed algorithms enable incorporation of prior knowledge which can be utilized for parameter estimation and is potentially beneficial in various downstream applications.
allerton conference on communication, control, and computing | 2011
Shreepriya Das; Haris Vikalo
We look at statistical methods to reduce error rates for Il-luminas next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model for Illuminas Genome Analyzer which is amenable to the use of dynamic programming ideas. We investigate the performance of the proposed algorithm for base-calling in terms of error rates, compare it with the state-of-the-art basecalling schemes for this platform, and demonstrate its feasibility on experimentally obtained data.
IEEE Transactions on Molecular, Biological, and Multi-Scale Communications | 2017
Shreepriya Das; Haris Vikalo
Haplotype assembly from high-throughput sequencing data is a computationally challenging problem. In fact, most of its formulations, including the most widely used one that relies on optimizing the minimum error correction criterion, are known to be NP-hard. Since finding exact solutions to haplotype assembly problems is difficult, suboptimal heuristics are often used. In this paper, we propose a novel method for optimal haplotype assembly that is based on depth-first branch-and-bound search of the solution space. Drawing on ideas from sphere decoding algorithms in digital communications, we exploit statistical information about errors in sequencing data to constrain the search of the haplotype space and thus efficiently find the optimal solution. Theoretical analysis and extensive simulation studies, as well as benchmarking on 1000 Genomes Project experimental data, demonstrate efficacy of the proposed method.
international conference on bioinformatics | 2009
Shreepriya Das; Haris Vikalo; Arjang Hassibi
The discrete-state continuous-time processes occurring in binding and release of analytes in affinity-based biosensors are formulated using stochastic differential equations (SDEs). The Fokker Planck (FP) equation is used to solve for the governing probability density function (pdf) of the number of captured analytes. A derivation method from the Markovian Master equation to the Fokker Planck equation is also given, which provides an alternative approach to the conventionally used Monte Carlo simulation methods, which has advantages in systems where the state space is large. Using FP equation, the time evolution of pdfs of the captured analytes in typical biosensor settings can be analyzed and subsequently be used to compute the expected behaviour and uncertainty of the detection and also be implemented in the design of optimal estimators.
Genomics | 2017
Somsubhra Barik; Shreepriya Das; Haris Vikalo
RNA viruses are characterized by high mutation rates that give rise to populations of closely related genomes, known as viral quasispecies. Underlying heterogeneity enables the quasispecies to adapt to changing conditions and proliferate over the course of an infection. Determining genetic diversity of a virus (i.e., inferring haplotypes and their proportions in the population) is essential for understanding its mutation patterns, and for effective drug developments. Here, we present QSdpR, a method and software for the reconstruction of quasispecies from short sequencing reads. The reconstruction is achieved by solving a correlation clustering problem on a read-similarity graph and the results of the clustering are used to estimate frequencies of sub-species; the number of sub-species is determined using pseudo F index. Extensive tests on both synthetic datasets and experimental HIV-1 and Zika virus data demonstrate that QSdpR compares favorably to existing methods in terms of various performance metrics.
ieee global conference on signal and information processing | 2014
Shreepriya Das; Haris Vikalo
Solving the haplotype assembly problem by optimizing the commonly used minimum error correction criterion is known to be NP-hard. For this reason, suboptimal heuristics are often used in practice. In this paper, we propose a novel method for optimal haplotype assembly that is based on depth-first branch-and-bound search of the solution space. Our scheme is inspired by the sphere decodng algorithms used heavily in the field of digital communications. Using the statistical information about errors in sequencing data, we constrain the search of the haplotype space and speedily find the optimal solution to the haplotype assembly problem. Theoretical analysis of the expected complexity of the algorithm shows that optimal haplotype assembly is practically feasible for haplotype blocks of moderate lengths typically obtained using present day high throughput sequencers. The scheme is then tested on 1000 Genomes Project experimental data to verify the efficacy of the proposed method.
John Wiley and Sons | 2011
Shreepriya Das; Haris Vikalo; Arjang Hassibi
This chapter contains sections titled: Modeling Biosensors: Introduction Biosensor Model: Deterministic and Stochastic Signal - to - Noise Ratio and Noise Figure Definitions Transient Signal - to - Noise Ratio Analyis Simulations Conclusion References