Hongliang Fei | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongliang Fei is active.

Explore More

Publication

Featured researches published by Hongliang Fei.

Knowledge and Information Systems | 2013

Structured feature selection and task relationship inference for multi-task learning

Hongliang Fei; Jun Huan

Multi-task learning (MTL) aims to enhance the generalization performance of supervised regression or classification by learning multiple related tasks simultaneously. In this paper, we aim to extend the current MTL techniques to high dimensional data sets with structured input and structured output (SISO), where the SI means the input features are structured and the SO means the tasks are structured. We investigate a completely ignored problem in MTL with SISO data: the interplay of structured feature selection and task relationship modeling. We hypothesize that combining the structure information of features and task relationship inference enables us to build more accurate MTL models. Based on the hypothesis, we have designed an efficient learning algorithm, in which we utilize a task covariance matrix related to the model parameters to capture the task relationship. In addition, we design a regularization formulation for incorporating the structured input features in MTL. We have developed an efficient iterative optimization algorithm to solve the corresponding optimization problem. Our algorithm is based on the accelerated first order gradient method in conjunction with the projected gradient scheme. Using two real-world data sets, we demonstrate the utility of the proposed learning methods.

conference on information and knowledge management | 2008

Structure feature selection for graph classification

Hongliang Fei; Jun Huan

With the development of highly efficient graph data collection technology in many application fields, classification of graph data emerges as an important topic in the data mining and machine learning community. Towards building highly accurate classification models for graph data, here we present an efficient graph feature selection method. In our method, we use frequent subgraphs as features for graph classification. Different from existing methods, we consider the spatial distribution of the subgraph features in the graph data and select those ones that have consistent spatial location. We have applied our feature selection methods to several cheminformatics benchmarks. Our method demonstrates a significant improvement of prediction as compared to the state-of-the-art methods.

knowledge discovery and data mining | 2011

Anomaly localization for network data streams with graph joint sparse PCA

Ruoyi Jiang; Hongliang Fei; Jun Huan

Determining anomalies in data streams that are collected and transformed from various types of networks has recently attracted significant research interest. Principal Component Analysis (PCA) has been extensively applied to detecting anomalies in network data streams. However, none of existing PCA based approaches addresses the problem of identifying the sources that contribute most to the observed anomaly, or anomaly localization. In this paper, we propose novel sparse PCA methods to perform anomaly detection and localization for network data streams. Our key observation is that we can localize anomalies by identifying a sparse low dimensional space that captures the abnormal events in data streams. To better capture the sources of anomalies, we incorporate the structure information of the network stream data in our anomaly localization framework. We have performed comprehensive experimental studies of the proposed methods, and have compared our methods with the state-ofthe-art using three real-world data sets from different application domains. Our experimental studies demonstrate the utility of the proposed methods.

IEEE Transactions on Knowledge and Data Engineering | 2013

A Family of Joint Sparse PCA Algorithms for Anomaly Localization in Network Data Streams

Ruoyi Jiang; Hongliang Fei; Jun Huan

Determining anomalies in data streams that are collected and transformed from various types of networks has recently attracted significant research interest. Principal component analysis (PCA) has been extensively applied to detecting anomalies in network data streams. However, none of existing PCA-based approaches addresses the problem of identifying the sources that contribute most to the observed anomaly, or anomaly localization. In this paper, we propose novel sparse PCA methods to perform anomaly detection and localization for network data streams. Our key observation is that we can localize anomalies by identifying a sparse low-dimensional space that captures the abnormal events in data streams. To better capture the sources of anomalies, we incorporate the structure information of the network stream data in our anomaly localization framework. Furthermore, we extend our joint sparse PCA framework with multidimensional Karhunen Loève Expansion that considers both spatial and temporal domains of data streams to stabilize localization performance. We have performed comprehensive experimental studies of the proposed methods and have compared our methods with the state-of-the-art using three real-world data sets from different application domains. Our experimental studies demonstrate the utility of the proposed methods.

ACM Transactions on Knowledge Discovery From Data | 2014

Structured Sparse Boosting for Graph Classification

Hongliang Fei; Jun Huan

Boosting is a highly effective algorithm that produces a linear combination of weak classifiers (a.k.a. base learners) to obtain high-quality classification models. In this article, we propose a generalized logit boost algorithm in which base learners have structural relationships in the functional space. Although such relationships are generic, our work is particularly motivated by the emerging topic of pattern-based classification for semistructured data including graphs. Toward an efficient incorporation of the structure information, we have designed a general model in which we use an undirected graph to capture the relationship of subgraph-based base learners. In our method, we employ both L1 and Laplacian-based L2 regularization to logit boosting to achieve model sparsity and smoothness in the functional space spanned by the base learners. We have derived efficient optimization algorithms based on coordinate descent for the new boosting formulation and theoretically prove that it exhibits a natural grouping effect for nearby spatial or overlapping base learners and that the resulting estimator is consistent. Additionally, motivated by the connection between logit boosting and logistic regression, we extend our structured sparse regularization framework to logistic regression for vectorial data in which features are structured. Using comprehensive experimental study and comparing our work with the state-of-the-art, we have demonstrated the effectiveness of the proposed learning method.

international conference on data mining | 2011

Structured Feature Selection and Task Relationship Inference for Multi-task Learning

Hongliang Fei; Jun Huan

Multi-task Learning (MTL) aims to enhance the generalization performance of supervised regression or classification by learning multiple related tasks simultaneously. In this paper, we aim to extend the current MTL techniques to high dimensional data sets with structured input and structured output (SISO), where the SI means the input features are structured and the SO means the tasks are structured. We investigate a completely ignored problem in MTL with SISO data: the interaction of structured feature selection and task relationship modeling. We hypothesize that combining the structure information of features and task relationship inference enables us to build more accurate MTL models. Based on the hypothesis, we have designed an efficient learning algorithm, in which we utilize a task covariance matrix related to the model parameters to capture the task relationship. In addition, we design a regularization formulation for incorporating the structure of features in MTL. We have developed an efficient iterative optimization algorithm to solve the corresponding optimization problem. Our algorithm is based on the accelerated first order gradient method in conjunction with the projected gradient scheme. Using two real-world data sets, the experimental results demonstrate the utility of the proposed learning methods.

conference on information and knowledge management | 2010

Regularization and feature selection for networked features

Hongliang Fei; Brian Quanz; Jun Huan

In the standard formalization of supervised learning problems, a datum is represented as a vector of features without prior knowledge about relationships among features. However, for many real world problems, we have such prior knowledge about structure relationships among features. For instance, in Microarray analysis where the genes are features, the genes form biological pathways. Such prior knowledge should be incorporated to build a more accurate and interpretable model, especially in applications with high dimensionality and low sample sizes. Towards an efficient incorporation of the structure relationships, we have designed a classification model where we use an undirected graph to capture the relationship of features. In our method, we combine both L1 norm and Laplacian based L2 norm regularization with logistic regression. In this approach, we enforce model sparsity and smoothness among features to identify a small subset of grouped features. We have derived efficient optimization algorithms based on coordinate decent for the new formulation. Using comprehensive experimental study, we have demonstrated the effectiveness of the proposed learning methods.

conference on information and knowledge management | 2009

L2 norm regularized feature kernel regression for graph data

Hongliang Fei; Jun Huan

Features in many real world applications such as Cheminformatics, Bioinformatics and Information Retrieval have complex internal structure. For example, frequent patterns mined from graph data are graphs. Such graph features have different number of nodes and edges and usually overlap with each other. In conventional data mining and machine learning applications, the internal structure of features are usually ignored. In this paper we consider a supervised learning problem where the features of the data set have intrinsic complexity, and we further assume that the feature intrinsic complexity may be measured by a kernel function. We hypothesize that by regularizing model parameters using the information of feature complexity, we can construct simple yet high quality model that captures the intrinsic structure of the data. Towards the end of testing this hypothesis, we focus on a regression task and have designed an algorithm that incorporate the feature complexity in the learning process, using a kernel matrix weighted L2 norm for regularization, to obtain improved regression performance over conventional learning methods that does not consider the additional information of the feature. We have tested our algorithm using 5 different real-world data sets and have demonstrate the effectiveness of our method.

international conference on computer communications and networks | 2009

Anomaly Detection with Sensor Data for Distributed Security

Brian Quanz; Hongliang Fei; Jun Huan; Joseph B. Evans; Victor S. Frost; Gary J. Minden; Daniel D. Deavours; Leon S. Searl; Daniel DePardo; Martin Kuehnhausen; Daniel T. Fokum; Matt Zeets; Angela N. Oguna

There has been increasing interest in incorporating sensing systems into objects or the environment for monitoring purposes. In this work we compare approaches to performing fully-distributed anomaly detection as a means of detecting secu- rity threats for objects equipped with sensing and communication abilities. With the desirability of increased visibility into the cargo in the transport chain and the goal of improving security, we consider the approach of equipping cargo with sensing and communication capabilities as a means of ensuring the security of the cargo as a key application. We have gathered real sensor test data from a rail trial and used the collected data to test the feasibility of the anomaly detection approach. The results demonstrate the effectiveness of our approach. I. INTRODUCTION

bioinformatics and biomedicine | 2010

Computational prediction of toxicity

Meenakshi Mishra; Hongliang Fei; Jun Huan

As the number of new chemicals developed and being used keep adding every year, having the toxic profiles of each chemical becomes a daunting challenge. To meet this information gap, EPA suggested that certain in vitro assays and computational methods, which predict toxicity related information in much lesser time and cost than traditional in vivo methods, may be used. In this paper, we use computational techniques to use results from certain in vitro assays applied on 309 chemicals (whose toxicity profile is readily available) along with the molecular descriptors and other computed physical-chemical properties of the chemicals to predict the toxicity caused by chemical at a particular endpoint. The dataset is available from EPA TOXCAST group online. We show that Random Forest and Naïve Bayes have a good performance on this dataset. We also show that using small and related trees in random forest help to further improve the performance.

Explore More