Hai H. Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hai H. Wang is active.

Explore More

Publication

Featured researches published by Hai H. Wang.

Knowledge and Information Systems | 2006

Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

Ji Zhang; Hai H. Wang

In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features) in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and itsknearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing outlier detection methods cannot fulfill this new task effectively.

Information Sciences | 2015

Detecting anomalies from big network traffic data using an adaptive detection approach

Ji Zhang; Hongzhou Li; Qigang Gao; Hai H. Wang; Yonglong Luo

The unprecedented explosion of real-life big data sets have sparked a lot of research interests in data mining in recent years. Many of these big data sets are generated in network environment and are characterized by a dauntingly large size and high dimensionality which pose great challenges for detecting useful knowledge and patterns, such as network traffic anomalies, from them. In this paper, we study the problem of anomaly detection in big network connection data sets and propose an outlier detection technique, called Adaptive Stream Projected Outlier deTector (A-SPOT), to detect anomalies from large data sets using a novel adaptive subspace analysis approach. A case study of A-SPOT is conducted in this paper by deploying it to the 1999 KDD CUP anomaly detection application. Innovative approaches for training data generation, anomaly classification and false positive reduction are proposed in this paper as well to better tailor A-SPOT to deal with the case study. Experimental results demonstrate that A-SPOT is effective and efficient in detecting anomalies from network data sets and outperforms existing detection methods.

Knowledge and Information Systems | 2010

Mining incomplete survey data through classification

Hai H. Wang; Shouhong Wang

Data mining with incomplete survey data is an immature subject area. Mining a database with incomplete data, the patterns of missing data as well as the potential implication of these missing data constitute valuable knowledge. This paper presents the conceptual foundations of data mining with incomplete data through classification which is relevant to a specific decision making problem. The proposed technique generally supposes that incomplete data and complete data may come from different sub-populations. The major objective of the proposed technique is to detect the interesting patterns of data missing behavior that are relevant to a specific decision making, instead of estimation of individual missing value. Using this technique, a set of complete data is used to acquire a near-optimal classifier. This classifier provides the prediction reference information for analyzing the incomplete data. The data missing behavior concealed in the missing data is then revealed. Using a real-world survey data set, the paper demonstrates the usefulness of this technique.

international conference on data mining | 2006

A Novel Method for Detecting Outlying Subspaces in High-dimensional Databases Using Genetic Algorithm

Ji Zhang; Qigang Gao; Hai H. Wang

Detecting outlying subspaces is a relatively new research problem in outlier-ness analysis for high-dimensional data. An outlying subspace for a given data point p is the sub- space in which p is an outlier. Outlying subspace detection can facilitate a better characterization process for the detected outliers. It can also enable outlier mining for high- dimensional data to be performed more accurately and efficiently. In this paper, we proposed a new method using genetic algorithm paradigm for searching outlying subspaces efficiently. We developed a technique for efficiently computing the lower and upper bounds of the distance between a given point and its kth nearest neighbor in each possible subspace. These bounds are used to speed up the fitness evaluation of the designed genetic algorithm for outlying subspace detection. We also proposed a random sampling technique to further reduce the computation of the genetic algorithm. The optimal number of sampling data is specified to ensure the accuracy of the result. We show that the proposed method is efficient and effective in handling outlying subspace detection problem by a set of experiments conducted on both synthetic and real-life datasets.

international conference on data engineering | 2008

SPOT: A System for Detecting Projected Outliers From High-dimensional Data Streams

Ji Zhang; Qigang Gao; Hai H. Wang

In this paper, we present a new technique, called stream projected ouliter detector (SPOT), to deal with outlier detection problem in high-dimensional data streams. SPOT is unique in a number of aspects. First, SPOT employs a novel window-based time model and decaying cell summaries to capture statistics from the data stream. Second, sparse subspace template (SST), a set of top sparse subspaces obtained by unsupervised and/or supervised learning processes, is constructed in SPOT to detect projected outliers effectively. Multi-Objective genetic algorithm (MOGA) is employed as an effective search method in unsupervised learning for finding outlying subspaces from training data. Finally, SST is able to carry out online self- evolution to cope with dynamics of data streams. This paper provides details on the motivation and technical challenges of detecting outliers from high-dimensional data streams, present an overview of SPOT, and give the plans for system demonstration of SPOT.

database and expert systems applications | 2009

Detecting Projected Outliers in High-Dimensional Data Streams

Ji Zhang; Qigang Gao; Hai H. Wang; Qing Liu; Kai Xu

In this paper, we study the problem of projected outlier detection in high dimensional data streams and propose a new technique, called Stream Projected Ouliter deTector (SPOT), to identify outliers embedded in subspaces. Sparse Subspace Template (SST), a set of subspaces obtained by unsupervised and/or supervised learning processes, is constructed in SPOT to detect projected outliers effectively. Multi-Objective Genetic Algorithm (MOGA) is employed as an effective search method for finding outlying subspaces from training data to construct SST. SST is able to carry out online self-evolution in the detection stage to cope with dynamics of data streams. The experimental results demonstrate the efficiency and effectiveness of SPOT in detecting outliers in high-dimensional data streams.

canadian conference on artificial intelligence | 2011

Anomaly-based network intrusion detection using outlier subspace analysis: a case study

David Kershaw; Qigang Gao; Hai H. Wang

This paper employs SPOT (Stream Projected Outlier de-Tector) as a prototype system for anomaly-based intrusion detection and evaluates its performance against other major methods. SPOT is capable of processing high-dimensional data streams and detecting novel attacks which exhibit abnormal behavior, making it a good candidate for network intrusion detection. This paper demonstrates SPOT is effective to distinguish between normal and abnormal processes in a UNIX System Call dataset.

computational intelligence and security | 2005

Grid-ODF: detecting outliers effectively and efficiently in large multi-dimensional databases

Wei Wang; Ji Zhang; Hai H. Wang

In this paper, we will propose a novel outlier mining algorithm, called Grid-ODF, that takes into account both the local and global perspectives of outliers for effective detection. The notion ofOutlying Degree Factor(ODF), that reflects the factors of both the density and distance, is introduced to rank outliers. A grid structure partitioning the data space is employed to enable Grid-ODF to be implemented efficiently. Experimental results show that Grid-ODF outperforms existing outlier detection algorithms such as LOF and KNN-distance in terms of effectiveness and efficiency.

soft computing | 2011

Detecting anomalies from high-dimensional wireless network data streams: a case study

Ji Zhang; Qigang Gao; Hai H. Wang; Hua Wang

In this paper, we study the problem of anomaly detection in wireless network streams. We have developed a new technique, called Stream Projected Outlier deTector (SPOT), to deal with the problem of anomaly detection from multi-dimensional or high-dimensional data streams. We conduct a detailed case study of SPOT in this paper by deploying it for anomaly detection from a real-life wireless network data stream. Since this wireless network data stream is unlabeled, a validating method is thus proposed to generate the ground-truth results in this case study for performance evaluation. Extensive experiments are conducted and the results demonstrate that SPOT is effective in detecting anomalies from wireless network data streams and outperforms existing anomaly detection methods.

international conference on tools with artificial intelligence | 2006

Discover Gene Specific Local Co-regulations Using Progressive Genetic Algorithm

Ji Zhang; Qigang Gao; Hai H. Wang

The problem of gene specific co-regulation discovery is that, for a particular gene of interest, identify its closely coregulated genes and the associated subsets of experimental conditions in which such co-regulations occur. The coregulations are local in the sense that they occur in some subsets of full experimental conditions. In this paper, we propose an innovative method for finding gene specific coregulations using genetic algorithm (GA). Two novel ad hoc GAs, the single-stage and two-stage progressive GA, are proposed. They are called progressive because the initial population for the GA in a window position inherits the top-ranked individuals obtained in the preceding window position, enabling them to achieve better accuracy than the nonprogressive algorithm. Experimental results with real-life gene expression data demonstrate the efficiency and effectiveness of our technique in discovering gene specific coregulations

Explore More