Daniel L. Pimentel-Alarcón
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel L. Pimentel-Alarcón.
IEEE Journal of Selected Topics in Signal Processing | 2016
Daniel L. Pimentel-Alarcón; Nigel Boston; Robert D. Nowak
Low-rank matrix completion (LRMC) problems arise in a wide variety of applications. Previous theory mainly provides conditions for completion under missing-at-random samplings. This paper studies deterministic conditions for completion. An incomplete d × N matrix is finitely rank-r completable if there are at most finitely many rank-r matrices that agree with all its observed entries. Finite completability is the tipping point in LRMC, as a few additional samples of a finitely completable matrix guarantee its unique completability. The main contribution of this paper is a characterization of finitely completable observation sets. We use this characterization to derive sufficient deterministic sampling conditions for unique completability. We also show that under uniform random sampling schemes, these conditions are satisfied with high probability if O(max{r,logd}) entries per column are observed.
ieee signal processing workshop on statistical signal processing | 2016
Daniel L. Pimentel-Alarcón; Laura Balzano; R. Mareia; Robert D. Nowak; Rebecca Willett
This paper explores algorithms for subspace clustering with missing data. In many high-dimensional data analysis settings, data points Lie in or near a union of subspaces. Subspace clustering is the process of estimating these subspaces and assigning each data point to one of them. However, in many modern applications the data are severely corrupted by missing values. This paper describes two novel methods for subspace clustering with missing data: (a) group-sparse sub-space clustering (GSSC), which is based on group-sparsity and alternating minimization, and (b) mixture subspace clustering (MSC), which models each data point as a convex combination of its projections onto all subspaces in the union. Both of these algorithms are shown to converge to a local minimum, and experimental results show that they outperform the previous state-of-the-art, with GSSC yielding the highest overall clustering accuracy.
international symposium on information theory | 2015
Daniel L. Pimentel-Alarcón; Nigel Boston; Robert D. Nowak
Consider an r-dimensional subspace of ℝd, r <; d, and suppose that we are only given projections of this subspace onto small subsets of the canonical coordinates. The paper establishes necessary and sufficient deterministic conditions on the subsets for subspace identifiability. The results also shed new light on low-rank matrix completion.
international symposium on information theory | 2016
Daniel L. Pimentel-Alarcón; Robert D. Nowak
In many practical applications, one is given a subset Ω of the entries in a d × N data matrix X, and aims to infer all the missing entries. Existing theory in low-rank matrix completion (LRMC) provides conditions on X (e.g., bounded coherence or genericity) and Ω (e.g., uniform random sampling or deterministic combinatorial conditions) to guarantee that if X is rank-r, then X is the only rank-r matrix that agrees with the observed entries, and hence X can be uniquely recovered by some method (e.g., nuclear norm or alternating minimization). In many situations, though, one does not know beforehand the rank of X, and depending on X and Ω, there may be rank-r matrices that agree with the observed entries, even if X is not rank-r. Hence one can be deceived into thinking that X is rank-r when it really is not. In this paper we give conditions on X (genericity) and a deterministic condition on Ω to guarantee that if there is a rank-r matrix that agrees with the observed entries, then X is indeed rank-r. While our condition on Ω is combinatorial, we provide a deterministic efficient algorithm to verify whether the condition is satisfied. Furthermore, this condition is satisfied with high probability under uniform random sampling schemes with only O(max{r, log d}) samples per column. This strengthens existing results in LRMC, allowing to drop the assumption that X is known a priori to be low-rank.
allerton conference on communication, control, and computing | 2015
Daniel L. Pimentel-Alarcón; Nigel Boston; Robert D. Nowak
Low-rank matrix completion (LRMC) problems arise in a wide variety of applications. Previous theory mainly provides conditions for completion under missing-at-random samplings. This paper studies deterministic conditions for completion. An incomplete d × N matrix is finitely rank-r completable if there are at most finitely many rank-r matrices that agree with all its observed entries. Finite completability is the tipping point in LRMC, as a few additional samples of a finitely completable matrix guarantee its unique completability. The main contribution of this paper is a characterization of finitely completable observation sets. We use this characterization to derive sufficient deterministic sampling conditions for unique completability. We also show that under uniform random sampling schemes, these conditions are satisfied with high probability if O(max{r; log d}) entries per column are observed.
Electronic Journal of Statistics | 2017
Daniel L. Pimentel-Alarcón; Robert D. Nowak
This paper presents r2pca, a random consensus method for robust principal component analysis. r2pca takes ransac’s principle of using as little data as possible one step further. It iteratively selects small subsets of the data to identify pieces of the principal components, to then stitch them together. We show that if the principal components are in general position and the errors are sufficiently sparse, r2pca will exactly recover the principal components with probability 1, in lieu of assumptions on coherence or the distribution of the sparse errors, and even under adversarial settings. r2pca enjoys many advantages: it works well under noise, its computational complexity scales linearly in the ambient dimension, it is easily parallelizable, and due to its low sample complexity, it can be used in settings where data is so large it cannot even be stored in memory. We complement our theoretical findings with synthetic and real data experiments showing that r2pca outperforms state-of-the-art methods in a broad range of settings.
allerton conference on communication, control, and computing | 2016
Daniel L. Pimentel-Alarcón; Laura Balzano; Robert D. Nowak
This paper is about an interesting phenomenon: two r-dimensional subspaces, even if they are orthogonal to one an other, can appear identical if they are only observed on a subset of coordinates. Understanding this phenomenon is of particular importance for many modern applications of subspace clustering where one would like to subsample in order to improve computational efficiency. Examples include real-time video surveillance and datasets so large that cannot even be stored in memory. In this paper we introduce a new metric between subspaces, which we call partial coordinate discrepancy. This metric captures a notion of similarity between subsampled subspaces that is not captured by other distance measures between subspaces. With this, we are able to show that subspace clustering is theoretically possible in lieu of coherence assumptions using only r + 1 rows of the dataset at hand. This gives precise information-theoretic necessary and sufficient conditions for sketched subspace clustering. This can greatly improve computational efficiency without compromising performance. We complement our theoretical analysis with synthetic and real data experiments.
international symposium on information theory | 2017
Daniel L. Pimentel-Alarcón; Aritra Biswas; Claudia Solís-Lemus
This paper studies the following question: where should an adversary place an outlier of a given magnitude in order to maximize the error of the subspace estimated by PCA? We give the exact location of this worst possible outlier, and the exact expression of the maximum possible error. Equivalently, we determine the information-theoretic bounds on how much an outlier can tilt a subspace in its direction. This in turn provides universal (worst-case) error bounds for PCA under arbitrary noisy settings. Our results also have several implications on adaptive PCA, online PCA, and rank-one updates. We illustrate our results with a subspace tracking experiment.
international conference on sampling theory and applications | 2017
Daniel L. Pimentel-Alarcón; Laura Balzano; Roummel F. Marcia; Robert D. Nowak; Rebecca Willett
In this paper we show that observations in a mixture can be modeled using a union of subspaces, and hence mixture regression can be posed as a subspace clustering problem. This allows to perform mixture regression even in the presence of missing data. We illustrate this using a state-of-the-art subspace clustering algorithm for incomplete data to perform mixed linear regression on gene functional data. Our approach outperforms existing methods on this task.
allerton conference on communication, control, and computing | 2016
Daniel L. Pimentel-Alarcón
In this paper we present a simple and efficient method to compute the canonical polyadic decomposition (CPD) of generic low-rank tensors using elementary linear algebra. The key insight is that all the columns in a low-rank tensor lie in a low-dimensional subspace, and that the coefficients of the columns in each slice with respect to the right basis are scaled copies of one an other. The basis, together with the coefficients of a few carefully selected columns determine the CPD. The computational complexity of our method scales linearly in the order and the rank of the tensor, and at most quadratically in its largest dimension. Furthermore, our approach can be easily adapted to noisy settings. We complement our theoretical analysis with experiments that support our findings.