Soledad Villar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Soledad Villar is active.

Explore More

Publication

Featured researches published by Soledad Villar.

conference on innovations in theoretical computer science | 2015

Relax, No Need to Round: Integrality of Clustering Formulations

Pranjal Awasthi; Afonso S. Bandeira; Moses Charikar; Ravishankar Krishnaswamy; Soledad Villar; Rachel Ward

We study exact recovery conditions for convex relaxations of point cloud clustering problems, focusing on two of the most common optimization problems for unsupervised clustering: k-means and k-median clustering. Motivations for focusing on convex relaxations are: (a) they come with a certificate of optimality, and (b) they are generic tools which are relatively parameter-free, not tailored to specific assumptions over the input. More precisely, we consider the distributional setting where there are k clusters in Rm and data from each cluster consists of n points sampled from a symmetric distribution within a ball of unit radius. We ask: what is the minimal separation distance between cluster centers needed for convex relaxations to exactly recover these k clusters as the optimal integral solution? For the k-median linear programming relaxation we show a tight bound: exact recovery is obtained given arbitrarily small pairwise separation ε > O between the balls. In other words, the pairwise center separation is δ > 2+ε. Under the same distributional model, the k-means LP relaxation fails to recover such clusters at separation as large as δ = 4. Yet, if we enforce PSD constraints on the k-means LP, we get exact cluster recovery at separation as low as δ > min{2 + √2k/m}, 2+√2 + 2/m} + ε. In contrast, common heuristics such as Lloyds algorithm (a.k.a. the k means algorithm) can fail to recover clusters in this setting; even with arbitrarily large cluster separation, k-means++ with overseeding by any constant factor fails with high probability at exact cluster recovery. To complement the theoretical analysis, we provide an experimental study of the recovery guarantees for these various methods, and discuss several open problems which these experiments suggest.

Mathematical Programming | 2017

Probably certifiably correct k-means clustering

Takayuki Iguchi; Dustin G. Mixon; Jesse Peterson; Soledad Villar

Recently, Bandeira (C R Math, 2015) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of k-means Peng and Wei (SIAM J Optim 18(1):186–205, 2007) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186–205, 2007) that is designed to solve k-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.

information theory workshop | 2016

Clustering subgaussian mixtures with k-means

Dustin G. Mixon; Soledad Villar; Rachel Ward

We introduce a model-free, parameter-free relax-and-round algorithm for k-means clustering, based on a semidefinite programming relaxation (SDP) due to Peng and Wei [1]. The algorithm interprets the SDP output as a denoised version of the original data and then rounds this output to a hard clustering. We analyze the performance of this algorithm in the setting where the data is drawn from a subgaussian mixture model. We also study the fundamental limits of estimating subgaussian centers with k-means clustering in order to compare our approximation guarantee to the theoretically optimal k-means clustering solution. In particular, our guarantee has no dependence on the number of points, and for equidistant clusters with O(k) separation, our guarantee is optimal up to a factor of k.

international conference on sampling theory and applications | 2017

Manifold optimization for k-means clustering

Timothy Carson; Dustin G. Mixon; Soledad Villar

We introduce a manifold optimization relaxation for k-means clustering that generalizes spectral clustering. We show how to implement it as gradient descent in a compact manifold. We also present numerical simulations of the algorithm using Manopt [5]. An extended version of this article, with further theory and numerical simulations will be available as [8].

arXiv: Computation | 2017

Projected power iteration for network alignment

Efe Onaran; Soledad Villar

The network alignment problem asks for the best correspondence between two given graphs, so that the largest possible number of edges are matched. This problem appears in many scientific problems (like the study of protein-protein interactions) and it is very closely related to the quadratic assignment problem which has graph isomorphism, traveling salesman and minimum bisection problems as particular cases. The graph matching problem is NP-hard in general. However, under some restrictive models for the graphs, algorithms can approximate the alignment efficiently. In that spirit the recent work by Feizi and collaborators introduce EigenAlign, a fast spectral method with convergence guarantees for Erdős-Renyí graphs. In this work we propose the algorithm Projected Power Alignment, which is a projected power iteration version of EigenAlign. We numerically show it improves the recovery rates of EigenAlign and we describe the theory that may be used to provide performance guarantees for Projected Power Alignment.

arXiv: Information Theory | 2015