Hang Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hang Yang is active.

Explore More

Publication

Featured researches published by Hang Yang.

data warehousing and knowledge discovery | 2011

Moderated VFDT in stream mining using adaptive tie threshold and incremental pruning

Hang Yang; Simon Fong

Very Fast Decision Tree (VFDT) is one of the most popular decision tree algorithms in data stream mining. The tree building process is based on the principle of the Hoeffding bound to decide on splitting nodes with sufficient data statistics at the leaf. The original version of VFDT requires a user-defined tie threshold by which a split will be forced to break to control the tree size. It is an open problem that the tree size grows tremendously with noise as continuous data stream in and the classifiers accuracy drops. In this paper, we propose a Moderated VFDT (M-VFDT), which uses an adaptive tie threshold for node splitting control by incremental computing. The tree building process is as fast as that of the original VFDT. The accuracy of M-VFDT improves significantly even under the presence of noise in the data stream. To solve the explosion of tree size, which is still an inherent problem in VFDT, we propose two lightweight pre-pruning mechanisms for stream mining (post-pruning is not appropriate here because of the streaming operation). Experiments are conducted to verify the merits of our new methods. M-VFDT with a pruning mechanism shows a better performance than the original VFDT at all times. Our contribution is a new model that can efficiently achieve a compact decision tree and good accuracy as an optimal balance in data stream mining.

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining | 2012

Incrementally optimized decision tree for noisy big data

Hang Yang; Simon Fong

How to extract meaningful information from big data has been a popular open problem. Decision tree, which has a high degree of knowledge interpretation, has been favored in many real world applications. However noisy values commonly exist in high-speed data streams, e.g. real-time online data feeds that are prone to interference. When processing big data, it is hard to implement pre-processing and sampling in full batches. To solve this tradeoff, this paper proposes a new incremental decision tree algorithm so called incrementally optimized very fast decision tree (iOVFDT). The experiment evaluates the proposed algorithm in comparison to existing methods under noisy data streams environment. Result shows iOVFDT has outperformance on the aspects of higher accuracy and smaller model size.

International Journal of Distributed Sensor Networks | 2012

A Very Fast Decision Tree Algorithm for Real-Time Data Mining of Imperfect Data Streams in a Distributed Wireless Sensor Network

Hang Yang; Simon Fong; Guangmin Sun; Raymond K. Wong

Wireless sensor networks (WSNs) are a rapidly emerging technology with a great potential in many ubiquitous applications. Although these sensors can be inexpensive, they are often relatively unreliable when deployed in harsh environments characterized by a vast amount of noisy and uncertain data, such as urban traffic control, earthquake zones, and battlefields. The data gathered by distributed sensors—which serve as the eyes and ears of the system—are delivered to a decision center or a gateway sensor node that interprets situational information from the data streams. Although many other machine learning techniques have been extensively studied, real-time data mining of high-speed and nonstationary data streams represents one of the most promising WSN solutions. This paper proposes a novel stream mining algorithm with a programmable mechanism for handling missing data. Experimental results from both synthetic and real-life data show that the new model is superior to standard algorithms.

Journal of Systems and Software | 2015

Countering the concept-drift problems in big data by an incrementally optimized stream mining model

Hang Yang; Simon Fong

The paper investigates the performance of incremental decision trees for concept drift, including VFDT, ADWIN, and iOVFDT.The computer simulation results show that iOVFDT has higher accuracy and less memory consumption than the original VFDT.The proposed method is useful and significant in fields of big data mining especially when concept-drift problem arises. Mining the potential value hidden behind big data has been a popular research topic around the world. For an infinite big data scenario, the underlying data distribution of newly arrived data may be appeared differently from the old one in the real world. This phenomenon is so-called the concept-drift problem that exists commonly in the scenario of big data mining. In the past decade, decision tree inductions use multi-tree learning to detect the drift using alternative trees as a solution. However, multi-tree algorithms consume more computing resources than the singletree. This paper proposes a singletree with an optimized node-splitting mechanism to detect the drift in a test-then-training tree-building process. In the experiment, we compare the performance of the new method to some state-of-art singletree and multi-tree algorithms. Result shows that the new algorithm performs with good accuracy while a more compact model size and less use of memory than the others.

Mathematical Problems in Engineering | 2013

Incremental Optimization Mechanism for Constructing a Decision Tree in Data Stream Mining

Hang Yang; Simon Fong

Imperfect data stream leads to tree size explosion and detrimental accuracy problems. Overfitting problem and the imbalanced class distribution reduce the performance of the original decision-tree algorithm for stream mining. In this paper, we propose an incremental optimization mechanism to solve these problems. The mechanism is called Optimized Very Fast Decision Tree (OVFDT) that possesses an optimized node-splitting control mechanism. Accuracy, tree size, and the learning time are the significant factors influencing the algorithm’s performance. Naturally a bigger tree size takes longer computation time. OVFDT is a pioneer model equipped with an incremental optimization mechanism that seeks for a balance between accuracy and tree size for data stream mining. It operates incrementally by a test-then-train approach. Three types of functional tree leaves improve the accuracy with which the tree model makes a prediction for a new data stream in the testing phase. The optimized node-splitting mechanism controls the tree model growth in the training phase. The experiment shows that OVFDT obtains an optimal tree structure in both numeric and nominal datasets.

computational science and engineering | 2013

Improving the Accuracy of Incremental Decision Tree Learning Algorithm via Loss Function

Hang Yang; Simon Fong

Hoeffdings bound (HB) has been widely used for node splitting in incremental decision tree algorithms. Many decision-tree algorithms adopt a sliding-window technique to detect concept drift when mining changing data streams. This paper presents a novel node-splitting approach that replaces the traditional HB with a new measure. The new measure is derived from a loss function applied in a cache-based classifier within a sliding window during incremental decision tree learning. Replacing the use of HB with this new bound is proposed for growing a Hoeffding decision tree that adapts to concept drifts detected in the data stream, thus improving the accuracy of prediction. The experimental results show that this new method has the potential to achieve better performance with fine tuning of the sliding window size.

New Mathematics and Natural Computation | 2013

Improving Adaptability Of Decision Tree For Mining Big Data

Hang Yang; Simon Fong

Big data has become a popular research topic since the data explosion in the past decade. An efficient analytical methodology provides a way of discovering the potential value from big data. Sampling technique is unsuitable any more that the full data will tell the truths. To this end, the data mining algorithm shall be robust to imperfect data, which may lead to tree size explosion and detrimental accuracy problems. In this paper, we propose an incremental optimization mechanism to solve these problems. The mechanism is called Optimized Very Fast Decision Tree (OVFDT) that possesses an optimized node-splitting control mechanism. Accuracy, tree size and the learning time are the significant factors, which contribute to a much-improved algorithm in performance. Naturally a bigger tree size takes longer computation time. OVFDT is a pioneer model equipped with an incremental optimization mechanism that seeks for a balance between accuracy and tree size for data stream mining. OVFDT operates incrementally by a test-then-train approach. Two new methods of functional tree leaves are proposed to improve the accuracy that the tree model makes a prediction for a new data stream in the testing phase. The optimized node-splitting mechanism controls the tree model growth in the training phase. The experiment supports our claim that OVFDT is an optimal tree structure in both numeric and nominal datasets.

International Journal of Sensor Networks | 2016

Atmospheric pattern recognition of human activities on ubiquitous sensor network using data stream mining algorithms

Hang Yang; Simon Fong; Kyungeun Cho; Junbo Wang

Ubiquitous sensor networks gain tremendous popularity nowadays with practical applications such as detection of natural disasters. These applications collect real-time data about the atmospheric measurements from sensors that are installed in the field. In this paper we argue that traditional data mining methods run short of accurately analysing the activity patterns from the sensor data stream. We evaluate the successor of these algorithms which is known as data stream mining by using an example of an indoor ubiquitous sensor network. They measure various atmospheric values that are supposedly prone to the influences of different human activities. Superior result is shown in the experiment that runs on this empirical data stream. The contribution of this paper is on a comparative study between using traditional and data stream mining algorithms, in a scenario where different atmospheric patterns are to be recognised from streaming sensor data.

International Journal of Distributed Sensor Networks | 2013

Optimizing Classification Decision Trees by Using Weighted Naïve Bayes Predictors to Reduce the Imbalanced Class Problem in Wireless Sensor Network

Hang Yang; Simon Fong; Raymond K. Wong; Guangmin Sun

Standard classification algorithms are often inaccurate when used in a wireless sensor network (WSN), where the observed data occur in imbalanced classes. The imbalanced data classification problem occurs when the number of samples in one class, usually the class of interest, is much lower than the number in the other classes. Many classification models have been studied in the data-mining research community. However, they all assume that the input data are stationary and bounded in size, so that resampling techniques and postadjustment by measuring the classification cost can be applied. In this paper, we devise a new scheme that extends a popular stream classification algorithm to the analysis of WSNs for reducing the adverse effects of the imbalanced class in the data. This new scheme is resource light at the algorithm level and does not require any data preprocessing. It uses weighted naïve Bayes predictors at the decision tree leaves to effectively reduce the impact of imbalanced classes. Experiments show that our modified algorithm outperforms the original stream classification algorithm.

data warehousing and knowledge discovery | 2012

Multi-objective optimization for incremental decision tree learning

Hang Yang; Simon Fong; Yain-Whar Si

Decision tree learning can be roughly classified into two categories: static and incremental inductions. Static tree induction applies greedy search in splitting test for obtaining a global optimal model. Incremental tree induction constructs a decision model by analyzing data in short segments; during each segment a local optimal tree structure is formed. Very Fast Decision Tree [4] is a typical incremental tree induction based on the principle of Hoeffding bound for node-splitting test. But it does not work well under noisy data. In this paper, we propose a new incremental tree induction model called incrementally Optimized Very Fast Decision Tree (iOVFDT), which uses a multi-objective incremental optimization method. iOVFDT also integrates four classifiers at the leaf levels. The proposed incremental tree induction model is tested with a large volume of data streams contaminated with noise. Under such noisy data, we investigate how iOVFDT that represents incremental induction method working with local optimums compares to C4.5 which loads the whole dataset for building a globally optimal decision tree. Our experiment results show that iOVFDT is able to achieve similar though slightly lower accuracy, but the decision tree size and induction time are much smaller than that of C4.5.

Explore More