Is this you? Create Your Porfile

Russel Pears

Auckland University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Russel Pears is active.

Explore More

Publication

Featured researches published by Russel Pears.

Machine Learning | 2014

Detecting concept change in dynamic data streams

Russel Pears; Sripirakas Sakthithasan; Yun Sing Koh

In this research we present a novel approach to the concept change detection problem. Change detection is a fundamental issue with data stream mining as classification models generated need to be updated when significant changes in the underlying data distribution occur. A number of change detection approaches have been proposed but they all suffer from limitations with respect to one or more key performance factors such as high computational complexity, poor sensitivity to gradual change, or the opposite problem of high false positive rate. Our approach uses reservoir sampling to build a sequential change detection model that offers statistically sound guarantees on false positive and false negative rates but has much smaller computational complexity than the ADWIN concept drift detector. Extensive experimentation on a wide variety of datasets reveals that the scheme also has a smaller false detection rate while maintaining a competitive true detection rate to ADWIN.

information interaction in context | 2006

Contextual relevance feedback in web information retrieval

Dilip Kumar Limbu; Andy M. Connor; Russel Pears; Stephen G. MacDonell

In this paper, we present an alternative approach to the problem of contextual relevance feedback in web-based information retrieval. Our approach utilises a rich contextual model that exploits a users implicit and explicit data. Each users implicit data are gathered from their Internet search histories on their local machine. The users explicit data are captured from a lexical database, a shared contextual knowledge base and domain-specific concepts using data mining techniques and a relevance feedback approach. This data is later used by our approach to modify queries to more accurately reflect the users interests as well as to continually build the users contextual profile and a shared contextual knowledge base. Finally, the approach retrieves personalised or contextual search results from the search engine using the modified/expanded query. Preliminary experiments indicate that our approach has the potential to not only aid in the contextual relevance feedback but also contribute towards the long term goal of intelligent relevance feedback in web-based information retrieval.

Information Sciences | 2013

Weighted association rule mining via a graph based connectivity model

Russel Pears; Yun Sing Koh; Gillian Dobbie; Wai K. Yeap

Association rule mining is an important data mining task that discovers relationships among items in a transaction database. Classical association rule mining approaches make the implicit assumption that an items importance is determined by its support. In contrast, Weighted Association Rule Mining (WARM) attempts to provide a notion of importance, or weight to individual items that are not based solely on item support. Previous approaches to Weighted Association Rule Mining assign item weights in a subjective manner, based on a users specialized knowledge of the underlying domain that is involved. Such approaches are infeasible when millions of items are present in a dataset, or when domain knowledge is unavailable. Furthermore, even when such domain information is available, a weight assignment based on subjective information constrains the knowledge discovered to fit with the weights assigned, thus inhibiting the discovery of new trends in the data. In this research we automate the process of weight assignment by formulating a linear model that captures relationships between items. This approach extends prior research based on the Valency model. We extend the Valency model by expanding the field of interaction beyond immediate neighborhoods and show that this leads to significant improvements in performance on a number of different metrics that we use.

international conference on information and automation | 2007

Use of Hoeffding trees in concept based data stream mining

Stefan Hoeglinger; Russel Pears

Recent research in data mining has focussed on developing new algorithms for mining high-speed data streams. Most real-world data streams have in common that the underlying data generation mechanism changes over time, introducing so-called concept drift into the data. Many current algorithms incorporate a time-based window to be able to cope with drift in order to keep their model up-to-date with the data stream. A major problem with this approach is the potential loss of valuable information as data slides out of the time window. This is particularly a concern in those environments where patterns recur. In this paper, we present a concept-based window approach, which is integrated with a high-speed decision tree learner. Our approach uses the content of the data stream itself in order to decide which information is to be erased. Several methodologies, all based around minimising the overall information loss when pruning the decision tree, are discussed.

knowledge discovery and data mining | 2011

Multiple time-series prediction through multiple time-series relationships profiling and clustered recurring trends

Harya Widiputra; Russel Pears; Nikola Kasabov

Time-series prediction has been very well researched by both the Statistical and Data Mining communities. However the multiple time-series problem of predicting simultaneous movement of a collection of time sensitive variables which are related to each other has received much less attention. Strong relationships between variables suggests that trajectories of given variables that are involved in the relationships can be improved by including the nature and strength of these relationships into a prediction model. The key challenge is to capture the dynamics of the relationships to reflect changes that take place continuously over time. In this research we propose a novel algorithm for extracting profiles of relationships through an evolving clustering method. We use a form of non-parametric regression analysis to generate predictions based on the profiles extracted and historical information from the past. Experimental results on a real-world climatic data reveal that the proposed algorithm outperforms well established methods of time-series prediction.

Expert Systems With Applications | 2013

Discovering diverse association rules from multidimensional schema

Muhammad Usman; Russel Pears; Alvis Cheuk M. Fong

The integration of data mining techniques with data warehousing is gaining popularity due to the fact that both disciplines complement each other in extracting knowledge from large datasets. However, the majority of approaches focus on applying data mining as a front end technology to mine data warehouses. Surprisingly, little progress has been made in incorporating mining techniques in the design of data warehouses. While methods such as data clustering applied on multidimensional data have been shown to enhance the knowledge discovery process, a number of fundamental issues remain unresolved with respect to the design of multidimensional schema. These relate to automated support for the selection of informative dimension and fact variables in high dimensional and data intensive environments, an activity which may challenge the capabilities of human designers on account of the sheer scale of data volume and variables involved. In this research, we propose a methodology that selects a subset of informative dimension and fact variables from an initial set of candidates. Our experimental results conducted on three real world datasets taken from the UCI machine learning repository show that the knowledge discovered from the schema that we generated was more diverse and informative than the standard approach of mining the original data without the use of our multidimensional structure imposed on it.

Information & Software Technology | 2014

Data stream mining for predicting software build outcomes using source code metrics

Jacqui Finlay; Russel Pears; Andy M. Connor

Context: Software development projects involve the use of a wide range of tools to produce a software artifact. Software repositories such as source control systems have become a focus for emergent research because they are a source of rich information regarding software development projects. The mining of such repositories is becoming increasingly common with a view to gaining a deeper understanding of the development process. Objective: This paper explores the concepts of representing a software development project as a process that results in the creation of a data stream. It also describes the extraction of metrics from the Jazz repository and the application of data stream mining techniques to identify useful metrics for predicting build success or failure. Method: This research is a systematic study using the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift by applying the Massive Online Analysis (MOA) tool. Results: The results indicate that only a relatively small number of the available measures considered have any significance for predicting the outcome of a build over time. These significant measures are identified and the implication of the results discussed, particularly the relative difficulty of being able to predict failed builds. The Hoeffding Tree approach is shown to produce a more stable and robust model than traditional data mining approaches. Conclusion: Overall prediction accuracies of 75% have been achieved through the use of the Hoeffding Tree classification method. Despite this high overall accuracy, there is greater difficulty in predicting failure than success. The emergence of a stable classification tree is limited by the lack of data but overall the approach shows promise in terms of informing software development activities in order to minimize the chance of failure.

pacific-asia conference on knowledge discovery and data mining | 2013

One Pass Concept Change Detection for Data Streams

Sripirakas Sakthithasan; Russel Pears; Yun Sing Koh

In this research we present a novel approach to the concept change detection problem. Change detection is a fundamental issue with data stream mining as models generated need to be updated when significant changes in the underlying data distribution occur. A number of change detection approaches have been proposed but they all suffer from limitations such as high computational complexity, poor sensitivity to gradual change, or the opposite problem of high false positive rate. Our approach, termed OnePassSampler, has low computational complexity as it avoids multiple scans on its memory buffer by sequentially processing data. Extensive experimentation on a wide variety of datasets reveals that OnePassSampler has a smaller false detection rate and smaller computational overheads while maintaining a competitive true detection rate to ADWIN2.

knowledge discovery and data mining | 2009

CBDT: A Concept Based Approach to Data Stream Mining

Stefan Hoeglinger; Russel Pears; Yun Sing Koh

Data Stream mining presents unique challenges compared to traditional mining on a random sample drawn from a stationary statistical distribution. Data from real-world data streams are subject to concept drift due to changes that take place continuously in the underlying data generation mechanism. Concept drift complicates the process of mining data as models that are learnt need to be updated continuously to reflect recent changes in the data while retaining relevant information that has been learnt from the past. In this paper, we describe a Concept Based Decision Tree (CBDT) learner and compare it with the CVDFT algorithm, which uses a sliding time window. Our experimental results show that CBDT outperforms CVFDT in terms of both classification accuracy and memory consumption.

international conference on data mining | 2014

Detecting Volatility Shift in Data Streams

David Tse Jung Huang; Yun Sing Koh; Gillian Dobbie; Russel Pears

Current drift detection techniques detect a change in distribution within a stream. However, there are no current techniques that analyze the change in the rate of these detected changes. We coin the term stream volatility, to describe the rate of changes in a stream. A stream has a high volatility if changes are detected frequently and has a low volatility if changes are detected infrequently. We are particularly interested in a volatility shift which is a change in the rate of change (e.g. From high volatility to low volatility). We introduce and define the concept of stream volatility, and propose a novel technique to detect volatility on data streams in the presence of concept drifts. In the experiments we show our algorithm to be both fast and efficient. We also propose a new algorithm for drift detection called SEED that is faster and more memory efficient than the existing state-of-the-art drift detection approach. A faster drift detection algorithm has a flow-on benefit to the subsequent volatility detection stage because both algorithms run concurrently on the data stream.

Explore More