Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Govinda M. Kamath is active.

Publication


Featured researches published by Govinda M. Kamath.


international symposium on information theory | 2012

Optimal linear codes with a local-error-correction property

N. Prakash; Govinda M. Kamath; V. Lalitha; P. Vijay Kumar

Motivated by applications to distributed storage, Gopalan et al recently introduced the interesting notion of information-symbol locality in a linear code. By this it is meant that each message symbol appears in a parity-check equation associated with small Hamming weight, thereby enabling recovery of the message symbol by examining a small number of other code symbols. This notion is expanded to the case when all code symbols, not just the message symbols, are covered by such “local” parity. In this paper, we extend the results of Gopalan et. al. so as to permit recovery of an erased code symbol even in the presence of errors in local parity symbols. We present tight bounds on the minimum distance of such codes and exhibit codes that are optimal with respect to the local error-correction property. As a corollary, we obtain an upper bound on the minimum distance of a concatenated code.


information theory and applications | 2013

Codes with local regeneration

Govinda M. Kamath; N. Prakash; V. Lalitha; P. Vijay Kumar

Regenerating codes and codes with locality are two schemes that have recently been proposed to ensure data collection and reliability in a distributed storage network. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. In this paper, we provide several constructions for a class of vector codes with locality in which the local codes are regenerating codes, that enjoy both advantages. We derive an upper bound on the minimum distance of this class of codes and show that the proposed constructions achieve this bound. The constructions include both the cases where the local regenerating codes correspond to the MSR as well as the MBR point on the storage-repair-bandwidth tradeoff curve of regenerating codes.


IEEE Transactions on Information Theory | 2014

Codes With Local Regeneration and Erasure Correction

Govinda M. Kamath; N. Prakash; V. Lalitha; P. Vijay Kumar

Regenerating codes and codes with locality are two coding schemes that have recently been proposed, which in addition to ensuring data collection and reliability, also enable efficient node repair. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. This paper presents results in two directions. In one, this paper extends the notion of codes with locality so as to permit local recovery of an erased code symbol even in the presence of multiple erasures, by employing local codes having minimum distance >2. An upper bound on the minimum distance of such codes is presented and codes that are optimal with respect to this bound are constructed. The second direction seeks to build codes that combine the advantages of both codes with locality as well as regenerating codes. These codes, termed here as codes with local regeneration, are codes with locality over a vector alphabet, in which the local codes themselves are regenerating codes. We derive an upper bound on the minimum distance of vector-alphabet codes with locality for the case when their constituent local codes have a certain uniform rank accumulation property. This property is possessed by both minimum storage regeneration (MSR) and minimum bandwidth regeneration (MBR) codes. We provide several constructions of codes with local regeneration which achieve this bound, where the local codes are either MSR or MBR codes. Also included in this paper, is an upper bound on the minimum distance of a general vector code with locality as well as the performance comparison of various code constructions of fixed block length and minimum distance.


international symposium on information theory | 2013

Explicit MBR all-symbol locality codes

Govinda M. Kamath; Natalia Silberstein; N. Prakash; Ankit Singh Rawat; V. Lalitha; Onur Ozan Koyluoglu; P. Vijay Kumar; Sriram Vishwanath

Node failures are inevitable in distributed storage systems (DSS). To enable efficient repair when faced with such failures, two main techniques are known: Regenerating codes, i.e., codes that minimize the total repair bandwidth; and codes with locality, which minimize the number of nodes participating in the repair process. This paper focuses on regenerating codes with locality, using pre-coding based on Gabidulin codes, and presents constructions that utilize minimum bandwidth regenerating (MBR) local codes. The constructions achieve maximum resilience (i.e., optimal minimum distance) and have maximum capacity (i.e., maximum rate). Finally, the same pre-coding mechanism can be combined with a subclass of fractional-repetition codes to enable maximum resilience and repair-by-transfer simultaneously.


Genome Biology | 2016

Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts

Vasilis Ntranos; Govinda M. Kamath; Jesse Zhang; Lior Pachter; David Tse

Current approaches to single-cell transcriptomic analysis are computationally intensive and require assay-specific modeling, which limits their scope and generality. We propose a novel method that compares and clusters cells based on their transcript-compatibility read counts rather than on the transcript or gene quantifications used in standard analysis pipelines. In the reanalysis of two landmark yet disparate single-cell RNA-seq datasets, we show that our method is up to two orders of magnitude faster than previous approaches, provides accurate and in some cases improved results, and is directly applicable to data from a wide variety of assays.


Nature Biotechnology | 2018

Random access in large-scale DNA data storage

Lee Organick; Siena Dumas Ang; Yuan Jyue Chen; Randolph Lopez; Sergey Yekhanin; Konstantin Makarychev; Miklós Z. Rácz; Govinda M. Kamath; Parikshit Gopalan; Bichlien Nguyen; Christopher N. Takahashi; Sharon Newman; Hsing Yeh Parker; Cyrus Rashtchian; Kendall Stewart; Gagan Gupta; Robert Carlson; John Mulligan; Douglas M. Carmean; Georg Seelig; Luis Ceze; Karin Strauss

Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.


international symposium on information theory | 2015

Optimal haplotype assembly from high-throughput mate-pair reads

Govinda M. Kamath; Eren Sasoglu; David Tse

Humans have 23 pairs of homologous chromosomes. The homologous pairs are identical except on certain documented positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. In this paper, we study the problem of inferring haplotypes of individuals from mate-pair reads of their genome. We give a simple formula for the coverage needed for haplotype assembly, under a generative model. The analysis here leverages connections of this problem with decoding convolutional codes.


international symposium on information theory | 2016

Partial DNA assembly: A rate-distortion perspective

Ilan Shomorony; Govinda M. Kamath; Fei Xia; Thomas A. Courtade; David Tse

Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is very often the case that the read data is not sufficiently rich to permit unambiguous reconstruction of the original sequence. While a natural generalization of the perfect assembly formulation to these cases would be to consider a rate-distortion framework, partial assemblies are usually represented in terms of an assembly graph, making the definition of a distortion measure challenging. In this work, we introduce a distortion function for assembly graphs that can be understood as the logarithm of the number of Eulerian cycles in the assembly graph, each of which correspond to a candidate assembly that could have generated the observed reads. We also introduce an algorithm for the construction of an assembly graph and analyze its performance on real genomes.


national conference on communications | 2012

Regenerating codes: A reformulated storage-bandwidth trade-off and a new construction

Govinda M. Kamath; P. Vijay Kumar

In this paper, the storage-repair-bandwidth (SRB) trade-off curve of regenerating codes is reformulated to yield a tradeoff between two global parameters of practical relevance, namely information rate and repair rate. The new information-repair-rate (IRR) tradeoff provides a different and insightful perspective on regenerating codes. For example, it provides a new motivation for seeking to investigate constructions corresponding to the interior of the SRB tradeoff. Interestingly, each point on the SRB tradeoff corresponds to a curve in the IRR tradeoff setup. We characterize completely, functional repair under the IRR framework, while for exact repair, an achievable region is presented. In the second part of this paper, a rate-half regenerating code for the minimum storage regenerating point is constructed that draws upon the theory of invariant subspaces. While the parameters of this rate-half code are the same as those of the MISER code, the construction itself is quite different.


bioRxiv | 2017

Scaling up DNA data storage and random access retrieval

Lee Organick; Siena Dumas Ang; Yuan-Jyue Chen; Randolph Lopez; Sergey Yekhanin; Konstantin Makarychev; Miklós Z. Rácz; Govinda M. Kamath; Parikshit Gopalan; Bichlien Nguyen; Christopher N. Takahashi; Sharon Newman; Hsing-Yeh Parker; Cyrus Rashtchian; Kendall Stewart; Gagan Gupta; Robert Carlson; John Mulligan; Douglas M. Carmean; Georg Seelig; Luis Ceze; Karin Strauss

Current storage technologies can no longer keep pace with exponentially growing amounts of data. 1 Synthetic DNA offers an attractive alternative due to its potential information density of ~ 1018 B/mm3, 107 times denser than magnetic tape, and potential durability of thousands of years.2 Recent advances in DNA data storage have highlighted technical challenges, in particular, coding and random access, but have stored only modest amounts of data in synthetic DNA. 3,4,5 This paper demonstrates an end-to-end approach toward the viability of DNA data storage with large-scale random access. We encoded and stored 35 distinct files, totaling 200MB of data, in more than 13 million DNA oligonucleotides (about 2 billion nucleotides in total) and fully recovered the data with no bit errors, representing an advance of almost an order of magnitude compared to prior work. 6 Our data curation focused on technologically advanced data types and historical relevance, including the Universal Declaration of Human Rights in over 100 languages,7 a high-definition music video of the band OK Go,8 and a CropTrust database of the seeds stored in the Svalbard Global Seed Vault.9 We developed a random access methodology based on selective amplification, for which we designed and validated a large library of primers, and successfully retrieved arbitrarily chosen items from a subset of our pool containing 10.3 million DNA sequences. Moreover, we developed a novel coding scheme that dramatically reduces the physical redundancy (sequencing read coverage) required for error-free decoding to a median of 5x, while maintaining levels of logical redundancy comparable to the best prior codes. We further stress-tested our coding approach by successfully decoding a file using the more error-prone nanopore-based sequencing. We provide a detailed analysis of errors in the process of writing, storing, and reading data from synthetic DNA at a large scale, which helps characterize DNA as a storage medium and justify our coding approach. Thus, we have demonstrated a significant improvement in data volume, random access, and encoding/decoding schemes that contribute to a whole-system vision for DNA data storage.

Collaboration


Dive into the Govinda M. Kamath's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

P. Vijay Kumar

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

N. Prakash

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

V. Lalitha

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Georg Seelig

University of Washington

View shared research outputs
Researchain Logo
Decentralizing Knowledge