Chong You | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chong You is active.

Explore More

Publication

Featured researches published by Chong You.

computer vision and pattern recognition | 2016

Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit

Chong You; Daniel P. Robinson; René Vidal

Subspace clustering methods based on ℓ1, ℓ2 or nuclear norm regularization have become very popular due to their simplicity, theoretical guarantees and empirical success. However, the choice of the regularizer can greatly impact both theory and practice. For instance, ℓ1 regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad conditions (e.g., arbitrary subspaces and corrupted data). However, it requires solving a large scale convex optimization problem. On the other hand, ℓ2 and nuclear norm regularization provide efficient closed form solutions, but require very strong assumptions to guarantee a subspace-preserving affinity, e.g., independent subspaces and uncorrupted data. In this paper we study a subspace clustering method based on orthogonal matching pursuit. We show that the method is both computationally efficient and guaranteed to give a subspace-preserving affinity under broad conditions. Experiments on synthetic data verify our theoretical analysis, and applications in handwritten digit and face clustering show that our approach achieves the best trade off between accuracy and efficiency. Moreover, our approach is the first one to handle 100,000 data points.

computer vision and pattern recognition | 2016

Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering

Chong You; Chun-Guang Li; Daniel P. Robinson; René Vidal

State-of-the-art subspace clustering methods are based on expressing each data point as a linear combination of other data points while regularizing the matrix of coefficients with ℓ1, ℓ2 or nuclear norms. ℓ1 regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad theoretical conditions, but the clusters may not be connected. ℓ2 and nuclear norm regularization often improve connectivity, but give a subspace-preserving affinity only for independent subspaces. Mixed ℓ1, ℓ2 and nuclear norm regularizations offer a balance between the subspace-preserving and connectedness properties, but this comes at the cost of increased computational complexity. This paper studies the geometry of the elastic net regularizer (a mixture of the ℓ1 and ℓ2 norms) and uses it to derive a provably correct and scalable active set method for finding the optimal coefficients. Our geometric analysis also provides a theoretical justification and a geometric interpretation for the balance between the connectedness (due to ℓ2 regularization) and subspace-preserving (due to ℓ1 regularization) properties for elastic net subspace clustering. Our experiments show that the proposed active set method not only achieves state-of-the-art clustering performance, but also efficiently handles large-scale datasets.

IEEE Transactions on Image Processing | 2017

Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework

Chun-Guang Li; Chong You; René Vidal

Subspace clustering refers to the problem of segmenting data drawn from a union of subspaces. State-of-the-art approaches for solving this problem follow a two-stage approach. In the first step, an affinity matrix is learned from the data using sparse or low-rank minimization techniques. In the second step, the segmentation is found by applying spectral clustering to this affinity. While this approach has led to the state-of-the-art results in many applications, it is suboptimal, because it does not exploit the fact that the affinity and the segmentation depend on each other. In this paper, we propose a joint optimization framework — Structured Sparse Subspace Clustering (S3C) — for learning both the affinity and the segmentation. The proposed S3C framework is based on expressing each data point as a structured sparse linear combination of all other data points, where the structure is induced by a norm that depends on the unknown segmentation. Moreover, we extend the proposed S3C framework into Constrained S3C (CS3C) in which available partial side-information is incorporated into the stage of learning the affinity. We show that both the structured sparse representation and the segmentation can be found via a combination of an alternating direction method of multipliers with spectral clustering. Experiments on a synthetic data set, the Extended Yale B face data set, the Hopkins 155 motion segmentation database, and three cancer data sets demonstrate the effectiveness of our approach.

computer vision and pattern recognition | 2017

Provable Self-Representation Based Outlier Detection in a Union of Subspaces

Chong You; Daniel P. Robinson; René Vidal

Many computer vision tasks involve processing large amounts of data contaminated by outliers, which need to be detected and rejected. While outlier detection methods based on robust statistics have existed for decades, only recently have methods based on sparse and low-rank representation been developed along with guarantees of correct outlier detection when the inliers lie in one or more low-dimensional subspaces. This paper proposes a new outlier detection method that combines tools from sparse representation with random walks on a graph. By exploiting the property that data points can be expressed as sparse linear combinations of each other, we obtain an asymmetric affinity matrix among data points, which we use to construct a weighted directed graph. By defining a suitable Markov Chain from this graph, we establish a connection between inliers/outliers and essential/inessential states of the Markov chain, which allows us to detect outliers by using random walks. We provide a theoretical analysis that justifies the correctness of our method under geometric and connectivity assumptions. Experimental results on image databases demonstrate its superiority with respect to state-of-the-art sparse and low-rank outlier detection methods.

Computational Optimization and Applications | 2018

A nonconvex formulation for low rank subspace clustering: algorithms and convergence analysis

Hao Jiang; Daniel P. Robinson; René Vidal; Chong You

We consider the problem of subspace clustering with data that is potentially corrupted by both dense noise and sparse gross errors. In particular, we study a recently proposed low rank subspace clustering approach based on a nonconvex modeling formulation. This formulation includes a nonconvex spectral function in the objective function that makes the optimization task challenging, e.g., it is unknown whether the alternating direction method of multipliers (ADMM) framework proposed to solve the nonconvex model formulation is provably convergent. In this paper, we establish that the spectral function is differentiable and give a formula for computing the derivative. Moreover, we show that the derivative of the spectral function is Lipschitz continuous and provide an explicit value for the Lipschitz constant. These facts are then used to provide a lower bound for how the penalty parameter in the ADMM method should be chosen. As long as the penalty parameter is chosen according to this bound, we show that the ADMM algorithm computes iterates that have a limit point satisfying first-order optimality conditions. We also present a second strategy for solving the nonconvex problem that is based on proximal gradient calculations. The convergence and performance of the algorithms is verified through experiments on real data from face and digit clustering and motion segmentation.

Archive | 2018

A Scalable Exemplar-Based Subspace Clustering Algorithm for Class-Imbalanced Data

Chong You; Chi Li; Daniel P. Robinson; René Vidal

Subspace clustering methods based on expressing each data point as a linear combination of a few other data points (e.g., sparse subspace clustering) have become a popular tool for unsupervised learning due to their empirical success and theoretical guarantees. However, their performance can be affected by imbalanced data distributions and large-scale datasets. This paper presents an exemplar-based subspace clustering method to tackle the problem of imbalanced and large-scale datasets. The proposed method searches for a subset of the data that best represents all data points as measured by the \(\ell _1\) norm of the representation coefficients. To solve our model efficiently, we introduce a farthest first search algorithm which iteratively selects the least well-represented point as an exemplar. When data comes from a union of subspaces, we prove that the computed subset contains enough exemplars from each subspace for expressing all data points even if the data are imbalanced. Our experiments demonstrate that the proposed method outperforms state-of-the-art subspace clustering methods in two large-scale image datasets that are imbalanced. We also demonstrate the effectiveness of our method on unsupervised data subset selection for a face image classification task.

asilomar conference on signals, systems and computers | 2016

A divide-and-conquer framework for large-scale subspace clustering

Chong You; Claire Donnat; Daniel P. Robinson; René Vidal

Given data that lies in a union of low-dimensional subspaces, the problem of subspace clustering aims to learn — in an unsupervised manner — the membership of the data to their respective subspaces. State-of-the-art subspace clustering methods typically adopt a two-step procedure. In the first step, an affinity measure among data points is constructed, usually by exploiting some form of data self-representation. In the second step, spectral clustering is applied to the affinity measure to find the membership of the data to their respective subspaces. While such methods are broadly applicable to mid-size datasets with 10,000 data points in 10,000 variables, they cannot be directly applied to large-scale datasets. This paper proposes a divide-and-conquer framework for large-scale subspace clustering. The data is first divided into chunks and subspace clustering is applied to each chunk. After removing potential outliers from each cluster, a new cross-representation measure for the similarity between subspaces is used to merge clusters from different chunks that correspond to the same subspace. A self-representation method is then used to assign outliers to clusters. We evaluate the proposed strategy on synthetic large-scale dataset with 1,000,000 data points, as well as on the MNIST database, which contains 70,000 images of handwritten digits. The numerical results highlight the scalability of our approach.

international conference on machine learning | 2015