Gillian Dobbie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gillian Dobbie is active.

Explore More

Publication

Featured researches published by Gillian Dobbie.

Swarm and evolutionary computation | 2014

Research on particle swarm optimization based clustering: A systematic review of literature and techniques

Shafiq Alam; Gillian Dobbie; Yun Sing Koh; Patricia Riddle; Saeed Ur Rehman

Abstract Optimization based pattern discovery has emerged as an important field in knowledge discovery and data mining (KDD), and has been used to enhance the efficiency and accuracy of clustering, classification, association rules and outlier detection. Cluster analysis, which identifies groups of similar data items in large datasets, is one of its recent beneficiaries. The increasing complexity and large amounts of data in the datasets have seen data clustering emerge as a popular focus for the application of optimization based techniques. Different optimization techniques have been applied to investigate the optimal solution for clustering problems. Swarm intelligence (SI) is one such optimization technique whose algorithms have successfully been demonstrated as solutions for different data clustering domains. In this paper we investigate the growth of literature in SI and its algorithms, particularly Particle Swarm Optimization (PSO). This paper makes two major contributions. Firstly, it provides a thorough literature overview focusing on some of the most cited techniques that have been used for PSO-based data clustering. Secondly, we analyze the reported results and highlight the performance of different techniques against contemporary clustering techniques. We also provide an brief overview of our PSO-based hierarchical clustering approach (HPSO-clustering) and compare the results with traditional hierarchical agglomerative clustering (HAC), K-means, and PSO clustering.

conference on information and knowledge management | 2001

XOO7: applying OO7 benchmark to XML query processing tool

Ying Guang Li; Stéphane Bressan; Gillian Dobbie; Zoé Lacroix; Mong Li Lee; Ullas Nambiar; Bimlesh Wadhwa

If XML is to play the critical role of the lingua franca for Internet data interchange that many predict, it is necessary to start designing and adopting benchmarks allowing the comparative performance analysis of the tools being developed and proposed. The effectiveness of existing XML query languages has been studied by many, with a focus on the comparison of linguistic features, implicitly reflecting the fact that most XML tools exist only on paper. In this paper, with a focus on efficiency and concreteness, we propose a pragmatic first step toward the systematic benchmarking of XML query processing platforms with an initial focus on the data (versus document) point of view. We propose XOO7, an XML version of the OO7 benchmark. We discuss the applicability of XOO7, its strengths, limitations and the extensions we are considering. We illustrate its use by presenting and discussing the performance comparison against XOO7 of three different query processing platforms for XML.

web information and data management | 2003

Extracting association rules from XML documents using XQuery

Jacky Wan; Gillian Dobbie

Data mining is generally considered the extraction and analysis of information from databases. With the rapid growth of XML data available online, mining XML data from the web is becoming important. In support of this trend, several encouraging attempts at developing methods for mining XML data have been proposed. However, efficiency and simplicity are still a barrier for further development In this paper, we show that any XML document can be mined for association rules using only the query language XQuery without any pre-processing or post-processing.

ieee swarm intelligence symposium | 2008

An Evolutionary Particle Swarm Optimization algorithm for data clustering

Shafiq Alam; Gillian Dobbie; Patricia Riddle

Clustering is an important data mining task and has been explored extensively by a number of researchers for different application areas such as finding similarities in images, text data and bio-informatics data. Various optimization techniques have been proposed to improve the performance of clustering algorithms. In this paper we propose a novel algorithm for clustering that we call evolutionary particle swarm optimization (EPSO)-clustering algorithm which is based on PSO. The proposed algorithm is based on the evolution of swarm generations where the particles are initially uniformly distributed in the input data space and after a specified number of iterations; a new generation of the swarm evolves. The swarm tries to dynamically adjust itself after each generation to optimal positions. The paper describes the new algorithm the initial implementation and presents tests performed on real clustering benchmark data. The proposed method is compared with k-means clustering- a benchmark clustering technique and simple particle swarm clustering algorithm. The results show that the algorithm is efficient and produces compact clusters.

data warehousing and olap | 2010

R-MESHJOIN for near-real-time data warehousing

M. Asif Naeem; Gillian Dobbie; Gerald Weber; Shafiq Alam

To fulfill the increasing demand of business for the latest information, current data integration approaches are moving towards real-time updates. One important element in real-time data integration is the join of a continuous incoming data stream with a disk-based relation. In this paper we investigate a stream-based join algorithm, called mesh join (MESHJOIN), and propose an improved version called reduced MESHJOIN (R-MESHJOIN). Both algorithms tune the memory, allocating parts of the memory to key components. In MESHJOIN there is a dependency between the size of partitions in an internal queue for the stream data and the number of iterations required to bring the disk-based relation into memory. This dependency hampers the optimal distribution of memory among the join components. In particular the size of the disk-buffer varies with the size of the disk-based relation which is unnecessary. On the other hand the R-MESHJOIN algorithm removes this dependency. This enables an optimal distribution of available memory among the join components. In R-MESHJOIN a change in the size of the disk-based relation does not affect the size of the disk-buffer. An experimental study is conducted in order to validate the arguments.

Information Sciences | 2013

Weighted association rule mining via a graph based connectivity model

Russel Pears; Yun Sing Koh; Gillian Dobbie; Wai K. Yeap

Association rule mining is an important data mining task that discovers relationships among items in a transaction database. Classical association rule mining approaches make the implicit assumption that an items importance is determined by its support. In contrast, Weighted Association Rule Mining (WARM) attempts to provide a notion of importance, or weight to individual items that are not based solely on item support. Previous approaches to Weighted Association Rule Mining assign item weights in a subjective manner, based on a users specialized knowledge of the underlying domain that is involved. Such approaches are infeasible when millions of items are present in a dataset, or when domain knowledge is unavailable. Furthermore, even when such domain information is available, a weight assignment based on subjective information constrains the knowledge discovered to fit with the weights assigned, thus inhibiting the discovery of new trends in the data. In this research we automate the process of weight assignment by formulating a linear model that captures relationships between items. This approach extends prior research based on the Valency model. We extend the Valency model by expanding the field of interaction beyond immediate neighborhoods and show that this leads to significant improvements in performance on a number of different metrics that we use.

web intelligence | 2010

Particle Swarm Optimization Based Hierarchical Agglomerative Clustering

Shafiq Alam; Gillian Dobbie; Patricia Riddle; M. Asif Naeem

Clustering- an important data mining task, which groups the data on the basis of similarities among the data, can be divided into two broad categories, partitional clustering and hierarchal. We combine these two methods and propose a novel clustering algorithm called Hierarchical Particle Swarm Optimization (HPSO) data clustering. The proposed algorithm exploits the swarm intelligence of cooperating agents in a decentralized environment. The experimental results were compared with benchmark clustering techniques, which include K-means, PSO clustering, Hierarchical Agglomerative clustering (HAC) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The results are evidence of the effectiveness of Swarm based clustering and the capability to perform clustering in a hierarchical agglomerative manner.

data warehousing and knowledge discovery | 2011

RP-Tree: rare pattern tree mining

Sidney Tsang; Yun Sing Koh; Gillian Dobbie

Most association rule mining techniques concentrate on finding frequent rules. However, rare association rules are in some cases more interesting than frequent association rules since rare rules represent unexpected or unknown associations. All current algorithms for rare association rule mining use an Apriori level-wise approach which has computationally expensive candidate generation and pruning steps. We propose RP-Tree, a method for mining a subset of rare association rules using a tree structure, and an information gain component that helps to identify the more interesting association rules. Empirical evaluation using a range of real world datasets shows that RP-Tree itemset and rule generation is more time efficient than modified versions of FP-Growth and ARIMA, and discovers 92-100% of all the interesting rare association rules.

international acm sigir conference on research and development in information retrieval | 2014

Detection of abnormal profiles on group attacks in recommender systems

Wei Zhou; Yun Sing Koh; Junhao Wen; Shafiq Alam; Gillian Dobbie

Recommender systems using Collaborative Filtering techniques are capable of make personalized predictions. However, these systems are highly vulnerable to profile injection attacks. Group attacks are attacks that target a group of items instead of one, and there are common attributes among these items. Such profiles will have a good probability of being similar to a large number of user profiles, making them hard to detect. We propose a novel technique for identifying group attack profiles which uses an improved metric based on Degree of Similarity with Top Neighbors (DegSim) and Rating Deviation from Mean Agreement (RDMA). We also extend our work with a detailed analysis of target item rating patterns. Experiments show that the combined methods can improve detection rates in user-based recommender systems.

british national conference on databases | 2011

X-HYBRIDJOIN for near-real-time data warehousing

Muhammad Asif Naeem; Gillian Dobbie; Gerald Weber

In order to make timely and effective decisions, businesses need the latest information from data warehouse repositories. To keep these repositories up-to-date with respect to end user updates, near-realtime data integration is required. An important phase in near-real-time data integration is data transformation where the stream of updates is joined with disk-based master data. The stream-based algorithm Mesh Join (MESHJOIN) has been proposed to amortize disk access over fast stream. MESHJOIN makes no assumptions about the data distribution. In real world applications, however, skewed distributions can be found, e.g, certain products are sold more frequently than the remainder of the products. The question arises, how much does MESHJOIN loose in terms of performance by not adapting to data skew. In this paper we perform a rigorous experimental study analyzing the possible performance improvements while considering typical data distributions. For this purpose we design an algorithm Extended Hybrid Join (X-HYBRIDJOIN) that is complementary to MESHJOIN in that it can adapt to data skew and stores parts of the master data in memory permanently, reducing the disk access overhead significantly. We compare the performance of X-HYBRIDJOIN against the performance of MESHJOIN. We take several precautions to make sure the comparison is adequate and focuses on the utilization of data skew. The experiments show that considering data skew offers substantial room for performance gains that cannot be used by non-adaptive approaches such as MESHJOIN.

Explore More