Howard J. Hamilton
University of Regina
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Howard J. Hamilton.
ACM Computing Surveys | 2006
Liqiang Geng; Howard J. Hamilton
Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.
data and knowledge engineering | 2006
Hong Yao; Howard J. Hamilton
The rationale behind mining frequent itemsets is that only itemsets with high frequency are of interest to users. However, the practical usefulness of frequent itemsets is limited by the significance of the discovered itemsets. A frequent item-set only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items. In this paper, we propose a utility based itemset mining approach to overcome this limitation. The proposed approach permits users to quantify their preferences concerning the usefulness of itemsets using utility values. The usefulness of an itemset is characterized as a utility constraint. That is, an itemset is interesting to the user only if it satisfies a given utility constraint. We show that the pruning strategies used in previous itemset mining approaches cannot be applied to utility constraints. In response, we identify several mathematical properties of utility constraints. Then, two novel pruning strategies are designed. Two algorithms for utility based itemset mining are developed by incorporating these pruning strategies. The algorithms are evaluated by applying them to synthetic and real world databases. Experimental results show that the proposed algorithms are effective on the databases tested.
pacific asia conference on knowledge discovery and data mining | 2001
Robert J. Hilderman; Howard J. Hamilton
When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.
knowledge discovery and data mining | 2003
Xin Wang; Howard J. Hamilton
In this paper, we propose a novel density-based spatial clustering method called DBRS. The algorithm can identify clusters of widely varying shapes, clusters of varying densities, clusters which depend on nonspatial attributes, and approximate clusters in very large databases. DBRS achieves these results by repeatedly picking an unclassified point at random and examining its neighborhood. A theoretical comparison of DBRS and DBSCAN, a well-known density-based algorithm, is also given in the paper.
Data Mining and Knowledge Discovery | 2003
Brock Barber; Howard J. Hamilton
Itemset share has been proposed as an additional measure of the importance of itemsets in association rule mining (Carter et al., 1997). We compare the share and support measures to illustrate that the share measure can provide useful information about numerical values that are typically associated with transaction items, which the support measure cannot. We define the problem of finding share frequent itemsets, and show that share frequency does not have the property of downward closure when it is defined in terms of the itemset as a whole. We present algorithms that do not rely on the property of downward closure, and thus are able to find share frequent itemsets that have infrequent subsets. The algorithms use heuristic methods to generate candidate itemsets. They supplement the information contained in the set of frequent itemsets from a previous pass, with other information that is available at no additional processing cost. They count only those generated itemsets that are predicted to be frequent. The algorithms are applied to a large commercial database and their effectiveness is examined using principles of classifier evaluation from machine learning.
IEEE Transactions on Knowledge and Data Engineering | 1998
Colin L. Carter; Howard J. Hamilton
We present GDBR (Generalize DataBase Relation) and FIGR (Fast, Incremental Generalization and Regeneralization), two enhancements of Attribute Oriented Generalization, a well known knowledge discovery from databases technique. GDBR and FIGR are both O(n) and, as such, are optimal. GDBR is an online algorithm and requires only a small, constant amount of space. FIGR also requires a constant amount of space that is generally reasonable, although under certain circumstances, may grow large. FIGR is incremental, allowing changes to the database to be reflected in the generalization results without rereading input data. FIGR also allows fast regeneralization to both higher and lower levels of generality without rereading input. We compare GDBR and FIGR to two previous algorithms, LCHR and AOI, which are O(n log n) and O(np), respectively, where n is the number of input tuples and p the number of tuples in the generalized relation. Both require O(n) space that, for large input, causes memory problems. We implemented all four algorithms and ran empirical tests, and we found that GDBR and FIGR are faster. In addition, their runtimes increase only linearly as input size increases, while the runtimes of LCHR and AOI increase greatly when input size exceeds memory limitations.
Data Mining and Knowledge Discovery | 2008
Hong Yao; Howard J. Hamilton
In this paper, we propose an efficient rule discovery algorithm, called FD_Mine, for mining functional dependencies from data. By exploiting Armstrong’s Axioms for functional dependencies, we identify equivalences among attributes, which can be used to reduce both the size of the dataset and the number of functional dependencies to be checked. We first describe four effective pruning rules that reduce the size of the search space. In particular, the number of functional dependencies to be checked is reduced by skipping the search for FDs that are logically implied by already discovered FDs. Then, we present the FD_Mine algorithm, which incorporates the four pruning rules into the mining process. We prove the correctness of FD_Mine, that is, we show that the pruning does not lead to the loss of useful information. We report the results of a series of experiments. These experiments show that the proposed algorithm is effective on 15 UCI datasets and synthetic data.
european conference on principles of data mining and knowledge discovery | 2004
Xin Wang; Camilo Rostoker; Howard J. Hamilton
In this paper, we propose a new spatial clustering method, called DBRS+, which aims to cluster spatial data in the presence of both obstacles and facilitators. It can handle datasets with intersected obstacles and facilitators. Without preprocessing, DBRS+ processes constraints during clustering. It can find clusters with arbitrary shapes and varying densities. DBRS+ has been empirically evaluated using synthetic and real data sets and its performance has been compared to DBRS, AUTOCLUST+, and DBCLuC*.
european conference on principles of data mining and knowledge discovery | 1997
Colin L. Carter; Howard J. Hamilton; Nick Cercone
We introduce the measures share, coincidence and dominance as alternatives to the standard itemset methodology measure of support. An itemset is a group of items bought together in a transaction. The support of an itemset is the ratio of transactions containing the itemset to the total number of transactions. The share of an itemset is the ratio of the count of items purchased together to the total count of items in all transactions. The coincidence of an itemset is the ratio of the count of items in that itemset to the total of those same items in the database. The dominance of an item in an itemset specifies the extent to which that item dominates the total of all items in the itemset. Share based measures have the advantage over support of reflecting accurately how many units are being moved by a business. The share measure can be extended to quantify the financial impact of an itemset on the business.
Quality Measures in Data Mining | 2007
Liqiang Geng; Howard J. Hamilton
Interestingness measures play an important role in data mining regardless of the kind of patterns being mined. Good measures should select and rank patterns according to their potential interest to the user. Good measures should also reduce the time and space cost of the mining process. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, and reviews the analysis methods and selection principles for appropriate measures for applications.