Donghong Han
Northeastern University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Donghong Han.
Cognitive Computation | 2015
Keyan Cao; Guoren Wang; Donghong Han; Jingwei Ning; Xin Zhang
AbstractClassification over data streams is an important task in data mining. The challenges become even larger when uncertain data are considered. An important challenge in the classification of uncertain data streams is concept drift and uncertainty of data. This paper studies the problem using extreme learning machine (ELM). We first propose weighted ensemble classifier based on ELM (WEC-ELM) algorithm, which can dynamically adjust classifier and the weight of training uncertain data to solve the problem of concept drift. Furthermore, an uncertainty classifier based on ELM (UC-ELM) algorithm is designed for the classification of uncertain data streams, which not only considers tuple value, but also its uncertainty, improving the efficiency and accuracy. Finally, the performance of our methods is verified through a large number of simulation experiments. The experimental results show that our methods are effective ways to solve the problem of classification of uncertain data streams and are able to solve the problem of concept drift, reduce the execution time and improve the efficiency.
Information Sciences | 2007
Guoren Wang; Xiangmin Zhou; Bin Wang; Baiyou Qiao; Donghong Han
In this paper, we propose a novel hyperplane based indexing method to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the key dimension based index. Compared with the key dimension concept, the hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Extensive experiments based on two types of real data sets are conducted and the results illustrate a significantly improved filtering efficiency. Because of the feature of hyperplane, the proposed indexing method is only suitable to Euclidean spaces.
Cognitive Computation | 2015
Donghong Han; Yachao Hu; Shuangshuang Ai; Guoren Wang
The problem of graph classification has attracted much attention in recent years. The existing work on graph classification has only dealt with precise and deterministic graph objects. However, the linkages between nodes in many real-world applications are inherently uncertain. In this paper, we focus on classification of graph objects with uncertainty. The method we propose can be divided into three steps: Firstly, we put forward a framework for classifying uncertain graph objects. Secondly, we extend the traditional algorithm used in the process of extracting frequent subgraphs to handle uncertain graph data. Thirdly, based on Extreme Learning Machine (ELM) with fast learning speed, a classifier is constructed. Extensive experiments on uncertain graph objects show that our method can produce better efficiency and effectiveness compared with other methods.
Neurocomputing | 2016
Keyan Cao; Guoren Wang; Donghong Han; Mei Bai; Shuoru Li
In recent years, along with the generation of uncertain data, more and more attention is paid to the mining of uncertain data. In this paper, we study the problem of classifying uncertain data using Extreme Learning Machine (ELM). We first propose the UU-ELM algorithm for classification of uncertain data which is uniformly distributed. Furthermore, the NU-ELM algorithm is proposed for classifying uncertain data which are non-uniformly distributed. By calculating bounds of the probability, the efficiency of the algorithm can be improved. Finally, the performances of our methods are verified through a large number of simulated experiments. The experimental results show that our methods are effective ways to solve the problem of uncertain data classification, reduce the execution time and improve the efficiency.
web-age information management | 2006
Donghong Han; Chuan Xiao; Rui Zhou; Guoren Wang; Huan Huo; Xiaoyun Hui
We present a novel load shedding technique over sliding window joins. We first construct a dual window architectural model including join-windows and aux-windows. With the statistics built on aux-windows, an effective load shedding strategy is developed to produce maximum subset join outputs. For the streams with high arrival rates, we propose an approach incorporating front-shedding and rear-shedding, and then address the problem of how to cooperate these two shedding processes through a series of calculations. Based on extensive experimentation with synthetic data and real life data, we show that our load shedding strategy delivers superb join output performance, and dominates the existing strategies.
international world wide web conferences | 2008
Guoren Wang; Huan Huo; Donghong Han; Xiaoyun Hui
With the extensive use of XML in applications over the Web, efficient query processing over streaming XML has become a core challenge due to one-pass processing and limited resources. Taking advantage of Hole-Filler model for XML fragments, this paper proposes a hybrid structure (FQ-Index) for both the queries and fragments, and proposes an XML fragment processing algorithm to evaluate forward XPath queries over streamed XML fragments. Two optimization rules, dependence pruning and prefix pruning are also developed. Dependence pruning scheme prunes off the dependent operations caused by fragmentation and transforms the queries for XML tag into queries for XML fragments, while prefix pruning scheme prunes off the “redundant” prefix along the path according to the tag structure. The effectiveness of the techniques developed is illustrated with a detailed set of experiments.
World Wide Web | 2015
Donghong Han; Siqi Liu; Yachao Hu; Bin Wang; Yongjiao Sun
It is common that different people share the same name. When it occurs in bibliography databases, it worsens the performance of information retrieval and data management. In this paper, we address the problem of name disambiguation and propose two different strategies, one classifier for each name (OCEN) and one classifier for all names (OCAN). Both strategies OCEN and OCAN are based on extreme learning machine (ELM) which shows similar or better generalization performance and faster learning speed than support vector machines (SVM) and least squares support vector machines (LS-SVM). We conduct experiments to compare the performance of ELM, SVM and LS-SVM in the two strategies.
Journal of Computer Science and Technology | 2015
Donghong Han; Xin Zhang; Guoren Wang
Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.
Journal of Computer Science and Technology | 2014
Keyan Cao; Guoren Wang; Donghong Han; Guohui Ding; Ai-Xia Wang; Lingxu Shi
Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach — Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.
asia-pacific web conference | 2013
Keyan Cao; Donghong Han; Guoren Wang; Yachao Hu; Ye Yuan
Outlier detection plays an important role in fraud detection, sensor net, computer network management and many other areas. Now the flow property and uncertainty of data are more and more apparent, outlier detection on uncertain data stream has become a new research topic. Firstly, we propose a new outlier concept on uncertain data stream based on possible worlds. Then an outlier detection method on uncertain data stream is proposed to meet the demand of limited storage and real-time processing. Next, a dynamic storage structure is designed for outlier detection on uncertain data stream over sliding window, to meet the demands of limited storage and real-time response. Furthermore, an efficient range query method based on SM-tree(Statistics M-tree) is proposed to reduce some redundant calculation. Finally, the performance of our method is verified through a large number of simulation experiments. The experimental results show that our method is an effective way to solve the problem of outlier detection on uncertain data stream, and it could significantly reduce the execution time and storage space.