Lifang He
Shenzhen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lifang He.
IEEE Transactions on Image Processing | 2013
Zhifeng Hao; Lifang He; Bingqian Chen; Xiaowei Yang
There has been growing interest in developing more effective learning machines for tensor classification. At present, most of the existing learning machines, such as support tensor machine (STM), involve nonconvex optimization problems and need to resort to iterative techniques. Obviously, it is very time-consuming and may suffer from local minima. In order to overcome these two shortcomings, in this paper, we present a novel linear support higher-order tensor machine (SHTM) which integrates the merits of linear C-support vector machine (C-SVM) and tensor rank-one decomposition. Theoretically, SHTM is an extension of the linear C-SVM to tensor patterns. When the input patterns are vectors, SHTM degenerates into the standard C-SVM. A set of experiments is conducted on nine second-order face recognition datasets and three third-order gait recognition datasets to illustrate the performance of the proposed SHTM. The statistic test shows that compared with STM and C-SVM with the RBF kernel, SHTM provides significant performance gain in terms of test accuracy and training speed, especially in the case of higher-order tensors.
siam international conference on data mining | 2014
Lifang He; Xiangnan Kong; Philip S. Yu; Ann B. Ragin; Zhifeng Hao; Xiaowei Yang
With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases (i.e., Alzheimers disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes.
european conference on machine learning | 2015
Weixiang Shao; Lifang He; Philip S. Yu
With the advance of technology, data are often with multiple modalities or coming from multiple sources. Multi-view clustering provides a natural way for generating clusters from such data. Although multi-view clustering has been successfully applied in many applications, most of the previous methods assumed the completeness of each view (i.e., each instance appears in all views). However, in real-world applications, it is often the case that a number of views are available for learning but none of them is complete. The incompleteness of all the views and the number of available views make it difficult to integrate all the incomplete views and get a better clustering solution. In this paper, we propose MIC (Multi-Incomplete-view Clustering), an algorithm based on weighted nonnegative matrix factorization with L2,1 regularization. The proposed MIC works by learning the latent feature matrices for all the views and generating a consensus matrix so that the difference between each view and the consensus is minimized. MIC has several advantages comparing with other existing methods. First, MIC incorporates weighted nonnegative matrix factorization, which handles the missing instances in each incomplete view. Second, MIC uses a co-regularized approach, which pushes the learned latent feature matrices of all the views towards a common consensus. By regularizing the disagreement between the latent feature matrices and the consensus, MIC can be easily extended to more than two incomplete views. Third, MIC incorporates L2,1 regularization into the weighted nonnegative matrix factorization, which makes it robust to noises and outliers. Forth, an iterative optimization framework is used in MIC, which is scalable and proved to converge. Experiments on real datasets demonstrate the advantages of MIC.
international conference on data mining | 2014
Bokai Cao; Lifang He; Xiangnan Kong; Philip S. Yu; Zhifeng Hao; Ann B. Ragin
In the era of big data, we can easily access information from multiple views which may be obtained from different sources or feature subsets. Generally, different views provide complementary information for learning tasks. Thus, multi-view learning can facilitate the learning process and is prevalent in a wide range of application domains. For example, in medical science, measurements from a series of medical examinations are documented for each subject, including clinical, imaging, immunologic, serologic and cognitive measures which are obtained from multiple sources. Specifically, for brain diagnosis, we can have different quantitative analysis which can be seen as different feature subsets of a subject. It is desirable to combine all these features in an effective way for disease diagnosis. However, some measurements from less relevant medical examinations can introduce irrelevant information which can even be exaggerated after view combinations. Feature selection should therefore be incorporated in the process of multi-view learning. In this paper, we explore tensor product to bring different views together in a joint space, and present a dual method of tensor-based multi-view feature selection DUAL-TMFS based on the idea of support vector machine recursive feature elimination. Experiments conducted on datasets derived from neurological disorder demonstrate the features selected by our proposed method yield better classification performance and are relevant to disease diagnosis.
advances in geographic information systems | 2015
Senzhang Wang; Lifang He; Leon Stenneth; Philip S. Yu; Zhoujun Li
Conventional traffic congestion estimation approaches require the deployment of traffic sensors or large-scale probe vehicles. The high cost of deploying and maintaining these equipments largely limits their spatial-temporal coverage. This paper proposes an alternative solution with lower cost and wider spatial coverage by exploring traffic related information from Twitter. By regarding each Twitter user as a traffic monitoring sensor, various real-time traffic information can be collected freely from each corner of the city. However, there are two major challenges for this problem. Firstly, the congestion related information extracted directly from real-time tweets are very sparse due both to the low resolution of geographic location mentioned in the tweets and the inherent sparsity nature of Twitter data. Secondly, the traffic event information coming from Twitter can be multi-typed including congestion, accident, road construction, etc. It is non-trivial to model the potential impacts of diverse traffic events on traffic congestion. We propose to enrich the sparse real-time tweets from two directions: 1) mining the spatial and temporal correlations of the road segments in congestion from historical data, and 2) applying auxiliary information including social events and road features for help. We finally propose a coupled matrix and tensor factorization model to effectively integrate rich information for Citywide Traffic Congestion Eestimation (CTCE). Extensive evaluations on Twitter data and 500 million public passenger buses GPS data on nearly 700 mile roads of Chicago demonstrate the efficiency and effectiveness of the proposed approach.
knowledge discovery and data mining | 2016
Lifang He; Chun Ta Lu; Jiaqi Ma; Jianping Cao; Linlin Shen; Philip S. Yu
Detecting communities (or modular structures) and structural hole spanners, the nodes bridging different communities in a network, are two essential tasks in the realm of network analytics. Due to the topological nature of communities and structural hole spanners, these two tasks are naturally tangled with each other, while there has been little synergy between them. In this paper, we propose a novel harmonic modularity method to tackle both tasks simultaneously. Specifically, we apply a harmonic function to measure the smoothness of community structure and to obtain the community indicator. We then investigate the sparsity level of the interactions between communities, with particular emphasis on the nodes connecting to multiple communities, to discriminate the indicator of SH spanners and assist the community guidance. Extensive experiments on real-world networks demonstrate that our proposed method outperforms several state-of-the-art methods in the community detection task and also in the SH spanner identification task (even the methods that require the supervised community information). Furthermore, by removing the SH spanners spotted by our method, we show that the quality of other community detection methods can be further improved.
mobile data management | 2016
Senzhang Wang; Lifang He; Leon Stenneth; Philip S. Yu; Zhoujun Li; Zhiqiu Huang
This paper studies the novel problem of more accurately estimating urban traffic congestions by integrating sparse probe data and traffic related information collected from social media. Limited by the lack of reliability and low sampling frequency of GPS probes, probe data are usually not sufficient for fully estimating traffic conditions of a large arterial network. To address the data sparsity challenge, we extensively collect and model traffic related data from multiple data sources. Besides the GPS probe data, we also extensively collect traffic related tweets that report various traffic events such as congestion, accident, and road construction from both traffic authority accounts and general user accounts from Twitter. To further explore other factors that might affect traffic conditions, we also extract auxiliary information including road congestion correlations, social events, road features, as well as point of interest (POI) for help. To integrate the different types of data coming from different sources, we finally propose a coupled matrix and tensor factorization model to more accurately complete the very sparse traffic congestion matrix by collaboratively factorizing it with other matrices and tensors formed by other data. We evaluate the proposed model on the arterial network of downtown Chicago with 1257 road segments. The results demonstrate the effectiveness and efficiency of the proposed model by comparison with previous approaches.
pacific-asia conference on knowledge discovery and data mining | 2015
Weixiang Shao; Lifang He; Philip S. Yu
With advances in data collection technologies, multiple data sources are assuming increasing prominence in many applications. Clustering from multiple data sources has emerged as a topic of critical significance in the data mining and machine learning community. Different data sources provide different levels of necessarily detailed knowledge. Thus, combining multiple data sources is pivotal to facilitate the clustering process. However, in reality, the data usually exhibits heterogeneity and incompleteness. The key challenge is how to effectively integrate information from multiple heterogeneous sources in the presence of missing data. Conventional methods mainly focus on clustering heterogeneous data with full information in all sources or at least one source without missing values. In this paper, we propose a more general framework T-MIC (Tensor based Multi-source Incomplete data Clustering) to integrate multiple incomplete data sources. Specifically, we first use the kernel matrices to form an initial tensor across all the multiple sources. Then we formulate a joint tensor factorization process with the sparsity constraint and use it to iteratively push the initial tensor towards a quality-driven exploration of the latent factors by taking into account missing data uncertainty. Finally, these factors serve as features to clustering. Extensive experiments on both synthetic and real datasets demonstrate that our proposed approach can effectively boost clustering performance, even with large amounts of missing data.
international conference on data mining | 2016
Weixiang Shao; Lifang He; Chun Ta Lu; Xiaokai Wei; Philip S. Yu
In this paper, we propose an Online unsupervised Multi-View Feature Selection method, OMVFS, which deals with large-scale/streaming multi-view data in an online fashion. OMVFS embeds unsupervised feature selection into a clustering algorithm via nonnegative matrix factorization with sparse learning. It further incorporates the graph regularization to preserve the local structure information and help select discriminative features. Instead of storing all the historical data, OMVFS processes the multi-view data chunk by chunk and aggregates all the necessary information into several small matrices. By using the buffering technique, the proposed OMVFS can reduce the computational and storage cost while taking advantage of the structure information. Furthermore, OMVFS can capture the concept drifts in the data streams. Extensive experiments on four real-world datasets show the effectiveness and efficiency of the proposed OMVFS method. More importantly, OMVFS is about 100 times faster than the off-line methods.
8th International Conference on Brain Informatics and Health, BIH 2015 | 2015
Xiaobing Han; Yanfei Zhong; Lifang He; Philip S. Yu; Liangpei Zhang
With the ongoing development of neuroimaging technology, neuroimaging classification has become a popular and challenging topic. The high dimension and small sample size characteristics pose many challenges to neuroimaging classification. The traditional neuroimaging classification solutions are tensor-based models, which may not fully consider the structural information and can’t mine the essential features of the input data. Considering the complicated properties of the neuroimaging data, a deep learning based algorithm—the hierarchical convolutional sparse auto-encoder (HCSAE) considering all dimensional information together is proposed in this paper. The HCSAE treats different convolutional sparse auto-encoder (CSAE) in an unsupervised hierarchical mode, where the CSAE extracts the essential features of the input by the sparse auto-encoder (SAE) and encodes the inputs in a convolutional manner, which helps to extract efficient and robust features and conserve abundant detail information for the neuroimaging classification. The proposed algorithm was verified by three human brain fMRI classification datasets, and showed a great potential compared with the traditional classification algorithms.