Handong Zhao
Northeastern University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Handong Zhao.
ieee international conference on automatic face gesture recognition | 2015
Handong Zhao; Zhengming Ding; Yun Fu
Subspace segmentation is one of the hottest issues in computer vision and machine learning fields. Generally, data (e.g. face images) are lying in a union of multiple linear subspaces, therefore, it is the key to find a block diagonal affinity matrix, which would result in segmenting data into different clusters correctly. Recently, graph construction based segmentation methods attract lots of attention. Following this line, we propose a novel approach to construct a Sparse Graph with Block-wise constraint for face representation, named SGB. Inspired by the recent study of least square regression coefficients, SGB firstly generates a compact block-diagonal coefficient matrix. Meanwhile, graph regularizer brings in a sparse graph, which focuses on the local structure and benefits multiple subspaces segmentation. By introducing different graph regularizers, our graph would be more balanced with b-matching constraint for balanced data. By using k-nearest neighbor regularizer, more manifold information can be preserved for unbalanced data. To solve our model, we come up with a joint optimization strategy to learn block-wise and sparse graph simultaneously. To demonstrate the effectiveness of our method, we consider two application scenarios, i.e., face clustering and kinship verification. Extensive results on Extended YaleB, ORL and kinship dataset Family101 demonstrate that our graph consistently outperforms several state-of-the-art graphs. Particularly, our method raises the performance bar by around 14% in kinship verification application.
IEEE Transactions on Image Processing | 2016
Jun Li; Yu Kong; Handong Zhao; Jian Yang; Yun Fu
Rooted in a basic hypothesis that a data matrix is strictly drawn from some independent subspaces, the low-rank representation (LRR) model and its variations have been successfully applied in various image classification tasks. However, this hypothesis is very strict to the LRR model as it cannot always be guaranteed in real images. Moreover, the hypothesis also prevents the sub-dictionaries of different subspaces from collaboratively representing an image. Fortunately, in supervised image classification, low-rank signal can be extracted from the independent label subspaces (ILS) instead of the independent image subspaces (IIS). Therefore, this paper proposes a projective low-rank representation (PLR) model by directly training a projective function to approximate the LRR derived from the labels. To the best of our knowledge, PLR is the first attempt to use the ILS hypothesis to relax the rigorous IIS hypothesis in the LRR models. We further prove a low-rank effect that the representations learned by PLR have high intraclass similarities and large interclass differences, which are beneficial to the classification tasks. The effectiveness of our proposed approach is validated by the experimental results on three databases.
IEEE Transactions on Circuits and Systems for Video Technology | 2018
Handong Zhao; Zhengming Ding; Yun Fu
The graph-based subspace segmentation technique has garnered a lot of attention in the visual data representation problem. In general, data (e.g., tracks of moving objects) are drawn from multiple linear subspaces. Thus, how to build a block-diagonal affinity matrix is the critical problem. In this paper, we propose a novel graph-based method, Ensemble Subspace Segmentation under Blockwise constraints (ESSB), which unifies least squares regression and a locality preserving graph regularizer into an ensemble learning framework. Specifically, compact encoding using least squares regression coefficients helps achieve a block-diagonal representation matrix among all samples. Meanwhile, the locality preserving regularizer tends to capture the intrinsic local structure, which further enhances the block-diagonal property. Both the blockwise efforts, i.e., least squares regression and the sparse regularizer, work jointly and are formulated in the ensemble learning framework, making ESSB more robust and efficient, especially when handling high-dimensional data. Finally, an efficient optimization solution based on inexact augmented Lagrange multiplier is derived with theoretical time complexity analysis. To demonstrate the effectiveness of the proposed method, we consider three different applications: face clustering, object clustering, and motion segmentation. Extensive results of both accuracy and normalized mutual information on four benchmarks, i.e., YaleB, ORL, COIL and Hopkins155, are reported. Also, the evaluations of computational cost are provided, based on which the superiority of our proposed method in both accuracy and efficiency is demonstrated compared with 12 baseline algorithms.
international joint conference on artificial intelligence | 2017
Yun Fu; Jun Li; Liu Hongfu; Handong Zhao
Low-rank subspace clustering (LRSC) has been considered as the state-of-the-art method on small datasets. LRSC constructs a desired similarity graph by low-rank representation (LRR), and employs a spectral clustering to segment the data samples. However, effectively applying LRSC into clustering big data becomes a challenge because both LRR and spectral clustering suffer from high computational cost. To address this challenge, we create a projective low-rank subspace clustering (PLrSC) scheme for large scale clustering problem. First, a small dataset is randomly sampled from big dataset. Second, our proposed predictive low-rank decomposition (PLD) is applied to train a deep encoder by using the small dataset, and the deep encoder is used to fast compute the low-rank representations of all data samples. Third, fast spectral clustering is employed to segment the representations. As a non-trivial contribution, we theoretically prove the deep encoder can universally approximate to the exact (or bounded) recovery of the row space. Experiments verify that our scheme outperforms the related methods on large scale datasets in a small amount of time. We achieve the state-of-art clustering accuracy by 95.8% on MNIST using scattering convolution features.
acm multimedia | 2017
Joseph P. Robinson; Ming Shao; Handong Zhao; Yue Wu; Timothy Gillis; Yun Fu
Recognizing Families In the Wild (RFIW) is organized as a Data Challenge Workshop in conjunction with ACM MM 2017. The workshop is scheduled for the afternoon of October 27th. RFIW is the 1st large-scale kinship recognition challenge and is made up of 2 tracks, kinship verification and family classification. In total, 12 final submissions were made. This big data challenge was achieved with our FIW dataset which is, by far, the largest image collection of its kind. Potential next steps for FIW are abundant.
Proceedings of the 2017 Workshop on Recognizing Families In the Wild | 2017
Joseph P. Robinson; Ming Shao; Handong Zhao; Yue Wu; Timothy Gillis; Yun Fu
Recognizing Families In the Wild (RFIW) is a large-scale, multi-track automatic kinship recognition evaluation, supporting both kinship verification and family classification on scales much larger than ever before. It was organized as a Data Challenge Workshop hosted in conjunction with ACM Multimedia 2017. This was achieved with the largest image collection that supports kin-based vision tasks. In the end, we use this manuscript to summarize evaluation protocols, progress made and some technical background and performance ratings of the algorithms used, and a discussion on promising directions for both research and engineers to be taken next in this line of work.
IEEE Transactions on Image Processing | 2018
Chengcheng Jia; Ming Shao; Sheng Li; Handong Zhao; Yun Fu
Spatially or temporally corrupted action videos are impractical for recognition via vision or learning models. It usually happens when streaming data are captured from unintended moving cameras, which bring occlusion or camera vibration and accordingly result in arbitrary loss of spatiotemporal information. In reality, it is intractable to deal with both spatial and temporal corruptions at the same time. In this paper, we propose a coupled stacked denoising tensor auto-encoder (CSDTAE) model, which approaches this corruption problem in a divide-and-conquer fashion by jointing both the spatial and temporal schemes together. In particular, each scheme is a SDTAE designed to handle either spatial or temporal corruption, respectively. SDTAE is composed of several blocks, each of which is a denoising tensor auto-encoder (DTAE). Therefore, CSDTAE is designed based on several DTAE building blocks to solve the spatiotemporal corruption problem simultaneously. In one DTAE, the video features are represented as a high-order tensor to preserve the spatiotemporal structure of data, where the temporal and spatial information are processed separately in different hidden layers via tensor unfolding. In summary, DTAE explores the spatial and temporal structure of the tensor representation, and SDTAE handles different corrupted ratios progressively to extract more discriminative features. CSDTAE couples the temporal and spatial corruptions of the same data through a thorough step-by-step procedure based on canonical correlation analysis, which integrates the two sub-problems into one problem. The key point is solving the spatiotemporal corruption in one model by considering them as noises in either spatial or temporal direction. Extensive experiments on three action data sets demonstrate the effectiveness of our model, especially when large volumes of corruption in the video.
IEEE Transactions on Image Processing | 2018
Handong Zhao; Hongfu Liu; Zhengming Ding; Yun Fu
Identifying different types of data outliers with abnormal behaviors in multi-view data setting is challenging due to the complicated data distributions across different views. Conventional approaches achieve this by learning a new latent feature representation with the pairwise constraint on different view data. In this paper, we argue that the existing methods are expensive in generalizing their models from two-view data to three-view (or more) data, in terms of the number of introduced variables and detection performance. To address this, we propose a novel multi-view outlier detection method with consensus regularization on the latent representations. Specifically, we explicitly characterize each kind of outliers by the intrinsic cluster assignment labels and sample-specific errors. Moreover, we make a thorough discussion about the proposed consensus-regularization and the pairwise-regularization. Correspondingly, an optimization solution based on augmented Lagrangian multiplier method is proposed and derived in details. In the experiments, we evaluate our method on five well-known machine learning data sets with different outlier settings. Further, to show its effectiveness in real-world computer vision scenario, we tailor our proposed model to saliency detection and face reconstruction applications. The extensive results of both standard multi-view outlier detection task and the extended computer vision tasks demonstrate the effectiveness of our proposed method.Identifying different types of data outliers with abnormal behaviors in multi-view data setting is challenging due to the complicated data distributions across different views. Conventional approaches achieve this by learning a new latent feature representation with the pairwise constraint on different view data. In this paper, we argue that the existing methods are expensive in generalizing their models from two-view data to three-view (or more) data, in terms of the number of introduced variables and detection performance. To address this, we propose a novel multi-view outlier detection method with consensus regularization on the latent representations. Specifically, we explicitly characterize each kind of outliers by the intrinsic cluster assignment labels and sample-specific errors. Moreover, we make a thorough discussion about the proposed consensus-regularization and the pairwise-regularization. Correspondingly, an optimization solution based on augmented Lagrangian multiplier method is proposed and derived in details. In the experiments, we evaluate our method on five well-known machine learning data sets with different outlier settings. Further, to show its effectiveness in real-world computer vision scenario, we tailor our proposed model to saliency detection and face reconstruction applications. The extensive results of both standard multi-view outlier detection task and the extended computer vision tasks demonstrate the effectiveness of our proposed method.
international joint conference on artificial intelligence | 2017
Jun Li; Handong Zhao; Zhiqiang Tao; Yun Fu
Large-Scale Subspace Clustering (LSSC) is an interesting and important problem in big data era. However, most existing methods (i.e., sparse or low-rank subspace clustering) cannot be directly used for solving LSSC because they suffer from the high time complexity-quadratic or cubic in n (the number of data points). To overcome this limitation, we propose a Fast Regression Coding (FRC) to optimize regression codes, and simultaneously train a nonlinear function to approximate the codes. By using FRC, we develop an efficient Regression Coding Clustering (RCC) framework to solve the LSSC problem. It consists of sampling, FRC and clustering. RCC randomly samples a small number of data points, quickly calculates the codes of all data points by using the non-linear function learned from FRC, and employs a large-scale spectral clustering method to cluster the codes. Besides, we provide a theorem guarantee that the non-linear function has a first-order approximation ability and a group effect. The theorem manifests that the codes are easily used to construct a dividable similarity graph. Compared with the state-of-the-art LSSC methods, our model achieves better clustering results in large-scale datasets.
international conference on data mining | 2015
Handong Zhao; Zhengming Ding; Ming Shao; Yun Fu
Graph-based semi-supervised learning method has been influential in the data mining and machine learning fields. The key is to construct an effective graph to capture the intrinsic data structure, which further benefits for propagating the unlabeled data over the graph. The existing methods have shown the effectiveness of a graph regularization term on measuring the similarities among samples, which further uncovers the data structure. However, all the existing graph-based methods are on the sample-level, i.e. calculate the similarity based on sample-level representation coefficients, inevitably overlooking the underlying part-level structure within sample. Inspired by the strong interpretability of Non-negative Matrix Factorization (NMF) method, we design a more robust and discriminative graph, by integrating low-rank factorization and graph regularizer into a unified framework. Specifically, a novel low-rank factorization through Semi-Non-negative Matrix Factorization (SNMF) is proposed to extract the semantically part-level representation. Moreover, instead of incorporating a graph regularization on sample-level, we propose a sparse graph regularization term built on the decomposed part-level representation. This practice results in a more accurate measurement among samples, generating a more discriminative graph for semi-supervised learning. As a non-trivial contribution, we also provide an optimization solution to the proposed method. Comprehensive experimental evaluations show that our proposed method is able to achieve superior performance compared with the state-of-the-art semi-supervised classification baselines in both transductive and inductive scenarios.