Pranjal Awasthi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pranjal Awasthi is active.

Explore More

Publication

Featured researches published by Pranjal Awasthi.

symposium on principles of database systems | 2007

Decision trees for entity identification: approximation algorithms and hardness results

Venkatesan T. Chakaravarthy; Vinayaka Pandit; Sambuddha Roy; Pranjal Awasthi; Mukesh K. Mohania

We consider the problem of constructing decision trees for entity identification from a given relational table. The input is a table containing information about a set of entities over a fixed set of attributes and a probability distribution over the set of entities that specifies the likelihood of the occurrence of each entity. The goal is to construct a decision tree that identifies each entity unambiguously by testing the attribute values such that the average number of tests is minimized. This classical problem finds such diverse applications as efficient fault detection, species identification in biology, and efficient diagnosis in the field of medicine. Prior work mainly deals with the special case where the input table is binary and the probability distribution over the set of entities is uniform. We study the general problem involving arbitrary input tables and arbitrary probability distributions over the set of entities. We consider a natural greedy algorithm and prove an approximation guarantee of O(rK • log N), where N is the number of entities and K is the maximum number of distinct values of an attribute. The value rK is a suitably defined Ramsey number, which is at most log K. We show that it is NP-hard to approximate the problem within a factor of Ω(log N), even for binary tables (i.e. K=2). Thus, for the case of binary tables, our approximation algorithm is optimal up to constant factors (since r2=2). In addition, our analysis indicates a possible way of resolving a Ramsey-theoretic conjecture by Erdos.

Information Processing Letters | 2012

Center-based clustering under perturbation stability

Pranjal Awasthi; Avrim Blum; Or Sheffet

Clustering under most popular objective functions is NP-hard, even to approximate well, and so unlikely to be efficiently solvable in the worst case. Recently, Bilu and Linial (2010) [11] suggested an approach aimed at bypassing this computational barrier by using properties of instances one might hope to hold in practice. In particular, they argue that instances in practice should be stable to small perturbations in the metric space and give an efficient algorithm for clustering instances of the Max-Cut problem that are stable to perturbations of size O(n^1^/^2). In addition, they conjecture that instances stable to as little as O(1) perturbations should be solvable in polynomial time. In this paper we prove that this conjecture is true for any center-based clustering objective (such as k-median, k-means, and k-center). Specifically, we show we can efficiently find the optimal clustering assuming only stability to factor-3 perturbations of the underlying metric in spaces without Steiner points, and stability to factor 2+3 perturbations for general metrics. In particular, we show for such instances that the popular Single-Linkage algorithm combined with dynamic programming will find the optimal clustering. We also present NP-hardness results under a weaker but related condition.

conference on innovations in theoretical computer science | 2015

Relax, No Need to Round: Integrality of Clustering Formulations

Pranjal Awasthi; Afonso S. Bandeira; Moses Charikar; Ravishankar Krishnaswamy; Soledad Villar; Rachel Ward

We study exact recovery conditions for convex relaxations of point cloud clustering problems, focusing on two of the most common optimization problems for unsupervised clustering: k-means and k-median clustering. Motivations for focusing on convex relaxations are: (a) they come with a certificate of optimality, and (b) they are generic tools which are relatively parameter-free, not tailored to specific assumptions over the input. More precisely, we consider the distributional setting where there are k clusters in Rm and data from each cluster consists of n points sampled from a symmetric distribution within a ball of unit radius. We ask: what is the minimal separation distance between cluster centers needed for convex relaxations to exactly recover these k clusters as the optimal integral solution? For the k-median linear programming relaxation we show a tight bound: exact recovery is obtained given arbitrarily small pairwise separation ε > O between the balls. In other words, the pairwise center separation is δ > 2+ε. Under the same distributional model, the k-means LP relaxation fails to recover such clusters at separation as large as δ = 4. Yet, if we enforce PSD constraints on the k-means LP, we get exact cluster recovery at separation as low as δ > min{2 + √2k/m}, 2+√2 + 2/m} + ε. In contrast, common heuristics such as Lloyds algorithm (a.k.a. the k means algorithm) can fail to recover clusters in this setting; even with arbitrarily large cluster separation, k-means++ with overseeding by any constant factor fails with high probability at exact cluster recovery. To complement the theoretical analysis, we provide an experimental study of the recovery guarantees for these various methods, and discuss several open problems which these experiments suggest.

international workshop and international workshop on approximation, randomization, and combinatorial optimization. algorithms and techniques | 2012

Improved Spectral-Norm Bounds for Clustering

Pranjal Awasthi; Or Sheffet

Aiming to unify known results about clustering mixtures of distributions under separation conditions, Kumar and Kannan [1] introduced a deterministic condition for clustering datasets. They showed that this single deterministic condition encompasses many previously studied clustering assumptions. More specifically, their proximity condition requires that in the target k-clustering, the projection of a point x onto the line joining its cluster center μ and some other center μ′, is a large additive factor closer to μ than to μ′. This additive factor can be roughly described as k times the spectral norm of the matrix representing the differences between the given (known) dataset and the means of the (unknown) target clustering. Clearly, the proximity condition implies center separation – the distance between any two centers must be as large as the above mentioned bound.

symposium on the theory of computing | 2014

The power of localization for efficiently learning linear separators with noise

Pranjal Awasthi; Maria-Florina Balcan; Philip M. Long

We introduce a new approach for designing computationally efficient and noise tolerant algorithms for learning linear separators. We consider the malicious noise model of Valiant [41, 32] and the adversarial label noise model of Kearns, Schapire, and Sellie [34]. For malicious noise, where the adversary can corrupt an η of fraction both the label part and the feature part, we provide a polynomial-time algorithm for learning linear separators in Rd under the uniform distribution with nearly information-theoretically optimal noise tolerance of η = Ω(ε), improving on the Ω(&epsilon/d1/4) noise-tolerance of [31] and the Ω(ε2/log(d/ε) of [35]. For the adversarial label noise model, where the distribution over the feature vectors is unchanged, and the overall probability of a noisy label is constrained to be at most η, we give a polynomial-time algorithm for learning linear separators in Rd under the uniform distribution that can also handle a noise rate of η = Ω(ε). This improves over the results of [31] which either required runtime super-exponential in 1/ε (ours is polynomial in 1/ε) or tolerated less noise. In the case that the distribution is isotropic log-concave, we present a polynomial-time algorithm for the malicious noise model that tolerates Ω(ε/log2(1/ε)) noise, and a polynomial-time algorithm for the adversarial label noise model that also handles Ω(ε/log2(1/ε)) noise. Both of these also improve on results from [35]. In particular, in the case of malicious noise, unlike previous results, our noise tolerance has no dependence on the dimension d of the space. Our algorithms are also efficient in the active learning setting, where learning algorithms only receive the classifications of examples when they ask for them. We show that, in this model, our algorithms achieve a label complexity whose dependence on the error parameter ε is polylogarithmic (and thus exponentially better than that of any passive algorithm). This provides the first polynomial time active learning algorithm for learning linear separators in the presence of malicious noise or adversarial label noise.

Algorithmica | 2016