Is this you? Create Your Porfile

Jayadev Acharya

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jayadev Acharya is active.

Explore More

Publication

Featured researches published by Jayadev Acharya.

Pattern Recognition Letters | 2008

Multilevel thresholding for image segmentation through a fast statistical recursive algorithm

Siddharth Arora; Jayadev Acharya; Amit Verma; Prasanta K. Panigrahi

A novel algorithm is proposed for segmenting an image into multiple levels using its mean and variance. Starting from the extreme pixel values at both ends of the histogram plot, the algorithm is applied recursively on sub-ranges computed from the previous step, so as to find a threshold level and a new sub-range for the next step, until no significant improvement in image quality can be achieved. The method makes use of the fact that a number of distributions tend towards Dirac delta function, peaking at the mean, in the limiting condition of vanishing variance. The procedure naturally provides for variable size segmentation with bigger blocks near the extreme pixel values and finer divisions around the mean or other chosen value for better visualization. Experiments on a variety of images show that the new algorithm effectively segments the image in computationally very less time.

symposium on principles of database systems | 2015

Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms

Jayadev Acharya; Ilias Diakonikolas; Chinmay Hegde; Jerry Zheng Li; Ludwig Schmidt

Histograms are among the most popular structures for the succinct summarization of data in a variety of database applications. In this work, we provide fast and near-optimal algorithms for approximating arbitrary one dimensional data distributions by histograms. A k-histogram is a piecewise constant function with k pieces. We consider the following natural problem, previously studied by Indyk, Levi, and Rubinfeld in PODS 2012: given samples from a distribution p over {1,...,n}, compute a k histogram that minimizes the l2-distance from p, up to an additive ε. We design an algorithm for this problem that uses the information-theoretically minimal sample size of m = O(1/ε2), runs in sample-linear time O(m), and outputs an O(k)-histogram whose l2-distance from p is at most O(optk) +ε, where optk is the minimum l2-distance between p and any k-histogram. Perhaps surprisingly, the sample size and running time of our algorithm are independent of the universe size. We generalize our approach to obtain fast algorithms for multi-scale histogram construction, as well as approximation by piecewise polynomial distributions. We experimentally demonstrate one to two orders of magnitude im rovement in terms of empirical running times over previous state-of-the-art algorithms.

international symposium on information theory | 2009

The maximum likelihood probability of unique-singleton, ternary, and length-7 patterns

Jayadev Acharya; Alon Orlitsky; Shengjun Pan

We derive several pattern maximum likelihood (PML) results, among them showing that if a pattern has only one symbol appearing once, its PML support size is at most twice the number of distinct symbols, and that if the pattern is ternary with at most one symbol appearing once, its PML support size is three. We apply these results to extend the set of patterns whose PML distribution is known to all ternary patterns, and to all but one pattern of length up to seven.

international symposium on information theory | 2014

Sorting with adversarial comparators and application to density estimation

Jayadev Acharya; Ashkan Jafarpour; Alon Orlitsky; Ananda Theertha Suresh

We consider the problems of sorting and maximum-selection of n elements using adversarial comparators. We derive a maximum-selection algorithm that uses 8n comparisons in expectation, and a sorting algorithm that uses 4n log2 n comparisons in expectation. Both are tight up to a constant factor. Our adversarial-comparator model was motivated by the practically important problem of density-estimation, where we observe samples from an unknown distribution, and try to determine which of n known distributions is closest to it. Existing algorithms run in Ω(n2) time. Applying the adversarial comparator results, we derive a density-estimation algorithm that runs in only O(n) time.

information theory workshop | 2009

Recent results on pattern maximum likelihood

Jayadev Acharya; Alon Orlitsky; Shengjun Pan

We derive some general sufficient conditions for the uniformity of the Pattern Maximum Likelihood distribution (PML). We also provide upper bounds on the support size of a class of patterns, and mention some recent results about the PML of 1112234.

international symposium on information theory | 2014

Sublinear algorithms for outlier detection and generalized closeness testing

Jayadev Acharya; Ashkan Jafarpour; Alon Orlitsky; Ananda Theertha Suresh

Outlier detection is the problem of finding a few different distributions in a set of mostly identical ones. Closeness testing is the problem of deciding whether two distributions are identical or different. We relate the two problems, construct a sub-linear generalized closeness test for unequal sample lengths, and use this result to derive a sub-linear universal outlier detector. We also lower bound the sample complexity of both problems.

international symposium on information theory | 2010

On reconstructing a string from its substring compositions

Jayadev Acharya; Hirakendu Das; Olgica Milenkovic; Alon Orlitsky; Shengjun Pan

Motivated by protein sequencing, we consider the problem of reconstructing a string from the compositions of its substrings. We provide several results, including the following. General classes of strings that cannot be distinguished from their substring compositions. An almost complete characterization of the lengths for which reconstruction is possible. Bounds on the number of strings with the same substring compositions in terms of the number of divisors of the string length plus one. A relation to the turnpike problem and a bivariate polynomial formulation of string reconstruction.

international symposium on information theory | 2013

Tight bounds for universal compression of large alphabets

Jayadev Acharya; Hirakendu Das; Ashkan Jafarpour; Alon Orlitsky; Ananda Theertha Suresh

Over the past decade, several papers, e.g., [1-7] and references therein, have considered universal compression of sources over large alphabets, often using patterns to avoid infinite redundancy. Improving on previous results, we prove tight bounds on expected- and worst-case pattern redundancy, in particular closing a decade-long gap and showing that the worst-case pattern redundancy of i.i.d. distributions is Θ(n1/3)†.

international symposium on information theory | 2010

Exact calculation of pattern probabilities

Jayadev Acharya; Hirakendu Das; Hosein Mohimani; Alon Orlitsky; Shengjun Pan

We describe two algorithms for calculating the probability of m-symbol length-n patterns over k-element distributions, a partition-based algorithm with complexity roughly 2O(m log m) and a recursive algorithm with complexity roughly 2O(m+log n) with the precise bounds provided in the text. The problem is related to symmetric-polynomial evaluation, and the analysis reveals a connection to the number of connected graphs.

international symposium on information theory | 2010

Classification using pattern probability estimators

Jayadev Acharya; Hirakendu Das; Alon Orlitsky; Shengjun Pan; Narayana P. Santhanam

We consider the problem of classification, where the data of the classes are generated i.i.d. according to unknown probability distributions. The goal is to classify test data with minimum error probability, based on the training data available for the classes. The Likelihood Ratio Test (LRT) is the optimal decision rule when the distributions are known. Hence, a popular approach for classification is to estimate the likelihoods using well known probability estimators, e.g., the Laplace and Good-Turing estimators, and use them in a LRT. We are primarily interested in situations where the alphabet of the underlying distributions is large compared to the training data available, which is indeed the case in most practical applications. We motivate and propose LRTs based on pattern probability estimators that are known to achieve low redundancy for universal compression of large alphabet sources. While a complete proof for optimality of these decision rules is warranted, we demonstrate their performance and compare it with other well-known classifiers by various experiments on synthetic data and real data for text classification.

Explore More