Is this you? Create Your Porfile

Hirakendu Das

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hirakendu Das is active.

Explore More

Publication

Featured researches published by Hirakendu Das.

international symposium on information theory | 2010

On reconstructing a string from its substring compositions

Jayadev Acharya; Hirakendu Das; Olgica Milenkovic; Alon Orlitsky; Shengjun Pan

Motivated by protein sequencing, we consider the problem of reconstructing a string from the compositions of its substrings. We provide several results, including the following. General classes of strings that cannot be distinguished from their substring compositions. An almost complete characterization of the lengths for which reconstruction is possible. Bounds on the number of strings with the same substring compositions in terms of the number of divisors of the string length plus one. A relation to the turnpike problem and a bivariate polynomial formulation of string reconstruction.

international symposium on information theory | 2013

Tight bounds for universal compression of large alphabets

Jayadev Acharya; Hirakendu Das; Ashkan Jafarpour; Alon Orlitsky; Ananda Theertha Suresh

Over the past decade, several papers, e.g., [1-7] and references therein, have considered universal compression of sources over large alphabets, often using patterns to avoid infinite redundancy. Improving on previous results, we prove tight bounds on expected- and worst-case pattern redundancy, in particular closing a decade-long gap and showing that the worst-case pattern redundancy of i.i.d. distributions is Θ(n1/3)†.

international symposium on information theory | 2009

Multiplicity assignments for algebraic soft-decoding of Reed-Solomon codes using the method of types

Hirakendu Das; Alexander Vardy

The probability of error in the Koetter-Vardy algebraic soft-decoding algorithm for Reed-Solomon codes is determined by the multiplicity assignment scheme used. A multiplicity assignment scheme converts the reliability matrix Π, consisting of the probabilities observed at the channel output, into a multiplicity matrix M that specifies the algebraic interpolation conditions. Using the method of types, Sanovs theorem in particular, we obtain tight exponential bounds on the probability of decoding error for a given multiplicity matrix. These bounds turn out to be essentially the same as the Chernoff bound. We establish several interesting properties of the multiplicity matrix M† which minimizes the exponent of the probability of error. Based on these observations, we develop a low-complexity multiplicity assignment scheme which uses nested bisection to solve for M†. This scheme provides the same probability of error as a known scheme based upon the Chernoff bound, but with much lower complexity. We also derive a simple condition on the reliability matrix Π which guarantees an exponentially small probability of error. This condition is akin to an error-correction radius, and can be used to study the performance of algebraic soft-decoding.

international symposium on information theory | 2010

Exact calculation of pattern probabilities

Jayadev Acharya; Hirakendu Das; Hosein Mohimani; Alon Orlitsky; Shengjun Pan

We describe two algorithms for calculating the probability of m-symbol length-n patterns over k-element distributions, a partition-based algorithm with complexity roughly 2O(m log m) and a recursive algorithm with complexity roughly 2O(m+log n) with the precise bounds provided in the text. The problem is related to symmetric-polynomial evaluation, and the analysis reveals a connection to the number of connected graphs.

international symposium on information theory | 2010

Classification using pattern probability estimators

Jayadev Acharya; Hirakendu Das; Alon Orlitsky; Shengjun Pan; Narayana P. Santhanam

We consider the problem of classification, where the data of the classes are generated i.i.d. according to unknown probability distributions. The goal is to classify test data with minimum error probability, based on the training data available for the classes. The Likelihood Ratio Test (LRT) is the optimal decision rule when the distributions are known. Hence, a popular approach for classification is to estimate the likelihoods using well known probability estimators, e.g., the Laplace and Good-Turing estimators, and use them in a LRT. We are primarily interested in situations where the alphabet of the underlying distributions is large compared to the training data available, which is indeed the case in most practical applications. We motivate and propose LRTs based on pattern probability estimators that are known to achieve low redundancy for universal compression of large alphabet sources. While a complete proof for optimality of these decision rules is warranted, we demonstrate their performance and compare it with other well-known classifiers by various experiments on synthetic data and real data for text classification.

SIAM Journal on Discrete Mathematics | 2015

String Reconstruction from Substring Compositions

Jayadev Acharya; Hirakendu Das; Olgica Milenkovic; Alon Orlitsky; Shengjun Pan

Motivated by mass-spectrometry protein sequencing, we consider the problem of reconstructing a string from the multisets of its substring composition. We show that all strings of length 7, one less than a prime and one less than twice a prime, can be reconstructed uniquely up to reversal. For all other lengths, we show that unique reconstruction is not always possible and provide sometimes-tight bounds on the largest number of strings with given substring compositions. The lower bounds are derived by combinatorial arguments, while the upper bounds follow from algebraic approaches that lead to precise characterizations of the sets of strings with the same substring compositions in terms of the factorization properties of bivariate polynomials. Using results on the transience of multidimensional random walks, we also provide a reconstruction algorithm that recovers random strings over alphabets of size ≥ 4 from their substring compositions in optimal near-quadratic time. The problem considered is related to the well-known turnpike problem, and its solution may hence shed light on this longstanding open problem as well.

international symposium on information theory | 2014

Quadratic-backtracking algorithm for string reconstruction from substring compositions

Jayadev Acharya; Hirakendu Das; Olgica Milenkovic; Alon Orlitsky; Shengjun Pan

Motivated by the problem of deducing the structure of proteins using mass-spectrometry, we study the reconstruction of a string from the multiset of its substring compositions. We specialize the backtracking algorithm used for the more general turnpike problem for string reconstruction. Employing well known results about transience of random walks in ≥ 3 dimensions, we show that the algorithm reconstructs random strings over alphabet size ≥ 4 with high probability in near-optimal quadratic time.

international symposium on information theory | 2011

Algebraic computation of pattern maximum likelihood

Jayadev Acharya; Hirakendu Das; Alon Orlitsky; Shengjun Pan

Pattern maximum likelihood (PML) is a technique for estimating the probability multiset of an unknown distribution. With any random sample, it associates the distribution maximizing the probability of its pattern. The required computation is a maximization of a monomial symmetric polynomial over the monotone simplex. The PML of only very few patterns have been found analytically, and for other patterns, the PML has been approximated by a heuristic algorithm. Taking an algebraic approach, we determine the PML of short patterns by solving a system of multivariate polynomial equations using the method of resultants. Using this approach, we determine the PML of the pattern 1112234, the last length-7 pattern whose PML was unknown. Under two plausible but yet unproved assumptions on the optimal alphabet size and the number of distinct probabilities, we also find the PML distribution of all previously unknown patterns of length up to 14.

international symposium on information theory | 2012

On the query computation and verification of functions

Hirakendu Das; Ashkan Jafarpour; Alon Orlitsky; Shengjun Pan; Ananda Theertha Suresh

In the query model of multi-variate function computation, the values of the variables are queried sequentially, in an order that may depend on previously revealed values, until the functions value can be determined. The functions computation query complexity is the lowest expected number of queries required by any query order. Instead of computation, it is often easier to consider verification, where the value of the function is given and the queries aim to verify it. The lowest expected number of queries necessary is the functions verification query complexity. We show that for all symmetric functions of independent binary random variables, the computation and verification complexities coincide. This provides a simple method for finding the query complexity and the optimal query order for computing many functions. We also show that if the symmetry condition is removed, there are functions whose verification complexity is strictly lower than their computation complexity, and mention that the same holds when the independence or binary conditions are removed.

international conference on big data | 2013

Large scale ad latency analysis

Mihajlo Grbovic; Jon Malkin; Hirakendu Das

Late web display advertisements are problematic for both the user experience and the monetary machinery powering the display advertising industry. If a web page is delivered to a user but the ad fails to load in time, the publisher cannot charge the advertiser for that impression. Detecting whether a specific ad will render in time could give the publisher a choice to show that ad or another one. Further, discovering the root causes of latency, possibly over time as new violators emerge, would allow the publisher to address the actionable issues. We propose a system that predicts, at serve time, which ads are likely to have high latency. Once identified we can either ignore those ads, even if they win the auction, or apply a penalty to those ads. In addition, our system collects the daily impression logs, consisting of different types of observations measured at serve time and the associated latency in milliseconds, and analyzes the data to identify the features associated with late ads and likely to be causing the delay.

Explore More