Jayadev Acharya
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jayadev Acharya.
Pattern Recognition Letters | 2008
Siddharth Arora; Jayadev Acharya; Amit Verma; Prasanta K. Panigrahi
A novel algorithm is proposed for segmenting an image into multiple levels using its mean and variance. Starting from the extreme pixel values at both ends of the histogram plot, the algorithm is applied recursively on sub-ranges computed from the previous step, so as to find a threshold level and a new sub-range for the next step, until no significant improvement in image quality can be achieved. The method makes use of the fact that a number of distributions tend towards Dirac delta function, peaking at the mean, in the limiting condition of vanishing variance. The procedure naturally provides for variable size segmentation with bigger blocks near the extreme pixel values and finer divisions around the mean or other chosen value for better visualization. Experiments on a variety of images show that the new algorithm effectively segments the image in computationally very less time.
symposium on principles of database systems | 2015
Jayadev Acharya; Ilias Diakonikolas; Chinmay Hegde; Jerry Zheng Li; Ludwig Schmidt
Histograms are among the most popular structures for the succinct summarization of data in a variety of database applications. In this work, we provide fast and near-optimal algorithms for approximating arbitrary one dimensional data distributions by histograms. A k-histogram is a piecewise constant function with k pieces. We consider the following natural problem, previously studied by Indyk, Levi, and Rubinfeld in PODS 2012: given samples from a distribution p over {1,...,n}, compute a k histogram that minimizes the l2-distance from p, up to an additive ε. We design an algorithm for this problem that uses the information-theoretically minimal sample size of m = O(1/ε2), runs in sample-linear time O(m), and outputs an O(k)-histogram whose l2-distance from p is at most O(optk) +ε, where optk is the minimum l2-distance between p and any k-histogram. Perhaps surprisingly, the sample size and running time of our algorithm are independent of the universe size. We generalize our approach to obtain fast algorithms for multi-scale histogram construction, as well as approximation by piecewise polynomial distributions. We experimentally demonstrate one to two orders of magnitude im rovement in terms of empirical running times over previous state-of-the-art algorithms.
international symposium on information theory | 2009
Jayadev Acharya; Alon Orlitsky; Shengjun Pan
We derive several pattern maximum likelihood (PML) results, among them showing that if a pattern has only one symbol appearing once, its PML support size is at most twice the number of distinct symbols, and that if the pattern is ternary with at most one symbol appearing once, its PML support size is three. We apply these results to extend the set of patterns whose PML distribution is known to all ternary patterns, and to all but one pattern of length up to seven.
international symposium on information theory | 2014
Jayadev Acharya; Ashkan Jafarpour; Alon Orlitsky; Ananda Theertha Suresh
We consider the problems of sorting and maximum-selection of n elements using adversarial comparators. We derive a maximum-selection algorithm that uses 8n comparisons in expectation, and a sorting algorithm that uses 4n log2 n comparisons in expectation. Both are tight up to a constant factor. Our adversarial-comparator model was motivated by the practically important problem of density-estimation, where we observe samples from an unknown distribution, and try to determine which of n known distributions is closest to it. Existing algorithms run in Ω(n2) time. Applying the adversarial comparator results, we derive a density-estimation algorithm that runs in only O(n) time.
information theory workshop | 2009
Jayadev Acharya; Alon Orlitsky; Shengjun Pan
We derive some general sufficient conditions for the uniformity of the Pattern Maximum Likelihood distribution (PML). We also provide upper bounds on the support size of a class of patterns, and mention some recent results about the PML of 1112234.
international symposium on information theory | 2014
Jayadev Acharya; Ashkan Jafarpour; Alon Orlitsky; Ananda Theertha Suresh
Outlier detection is the problem of finding a few different distributions in a set of mostly identical ones. Closeness testing is the problem of deciding whether two distributions are identical or different. We relate the two problems, construct a sub-linear generalized closeness test for unequal sample lengths, and use this result to derive a sub-linear universal outlier detector. We also lower bound the sample complexity of both problems.
international symposium on information theory | 2010
Jayadev Acharya; Hirakendu Das; Olgica Milenkovic; Alon Orlitsky; Shengjun Pan
Motivated by protein sequencing, we consider the problem of reconstructing a string from the compositions of its substrings. We provide several results, including the following. General classes of strings that cannot be distinguished from their substring compositions. An almost complete characterization of the lengths for which reconstruction is possible. Bounds on the number of strings with the same substring compositions in terms of the number of divisors of the string length plus one. A relation to the turnpike problem and a bivariate polynomial formulation of string reconstruction.
international symposium on information theory | 2013
Jayadev Acharya; Hirakendu Das; Ashkan Jafarpour; Alon Orlitsky; Ananda Theertha Suresh
Over the past decade, several papers, e.g., [1-7] and references therein, have considered universal compression of sources over large alphabets, often using patterns to avoid infinite redundancy. Improving on previous results, we prove tight bounds on expected- and worst-case pattern redundancy, in particular closing a decade-long gap and showing that the worst-case pattern redundancy of i.i.d. distributions is Θ(n1/3)†.
international symposium on information theory | 2010
Jayadev Acharya; Hirakendu Das; Hosein Mohimani; Alon Orlitsky; Shengjun Pan
We describe two algorithms for calculating the probability of m-symbol length-n patterns over k-element distributions, a partition-based algorithm with complexity roughly 2O(m log m) and a recursive algorithm with complexity roughly 2O(m+log n) with the precise bounds provided in the text. The problem is related to symmetric-polynomial evaluation, and the analysis reveals a connection to the number of connected graphs.
international symposium on information theory | 2010
Jayadev Acharya; Hirakendu Das; Alon Orlitsky; Shengjun Pan; Narayana P. Santhanam
We consider the problem of classification, where the data of the classes are generated i.i.d. according to unknown probability distributions. The goal is to classify test data with minimum error probability, based on the training data available for the classes. The Likelihood Ratio Test (LRT) is the optimal decision rule when the distributions are known. Hence, a popular approach for classification is to estimate the likelihoods using well known probability estimators, e.g., the Laplace and Good-Turing estimators, and use them in a LRT. We are primarily interested in situations where the alphabet of the underlying distributions is large compared to the training data available, which is indeed the case in most practical applications. We motivate and propose LRTs based on pattern probability estimators that are known to achieve low redundancy for universal compression of large alphabet sources. While a complete proof for optimality of these decision rules is warranted, we demonstrate their performance and compare it with other well-known classifiers by various experiments on synthetic data and real data for text classification.