Robert J. Hilderman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert J. Hilderman is active.

Explore More

Publication

Featured researches published by Robert J. Hilderman.

pacific asia conference on knowledge discovery and data mining | 2001

Evaluation of Interestingness Measures for Ranking Discovered Knowledge

Robert J. Hilderman; Howard J. Hamilton

When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.

european conference on principles of data mining and knowledge discovery | 1999

Heuristic Measures of Interestingness

Robert J. Hilderman; Howard J. Hamilton

The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some probability distribution. In this paper, we present and empirically compare sixteen heuristic measures that evaluate the structure of a summary to assign a single real-valued index that represents its interestingness relative to other summaries generated from the same database. The heuristics are based upon well-known measures of diversity, dispersion, dominance, and inequality used in several areas of the physical, social, ecological, management, information, and computer sciences. Their use for ranking summaries generated from databases is a new application area. All sixteen heuristics rank less complex summaries (i.e., those with few tuples and/or few non-ANY attributes) as most interesting. We demonstrate that for sample data sets, the order in which some of the measures rank summaries is highly correlated.

european conference on principles of data mining and knowledge discovery | 2000

Applying Objective Interestingness Measures in Data Mining Systems

Robert J. Hilderman; Howard J. Hamilton

One of the most important steps in any knowledge discovery task is the interpretation and evaluation of discovered patterns. To address this problem, various techniques, such as the chi-square test for independence, have been suggested to reduce the number of patterns presented to the user and to focus attention on those that are truly statistically significant. However, when mining a large database, the number of patterns discovered can remain large even after adjusting significance thresholds to eliminate spurious patterns. What is needed, then, is an effective measure to further assist in the interpretation and evaluation step that ranks the interestingness of the remaining patterns prior to presenting them to the user. In this paper, we describe a two-step process for ranking the interestingness of discovered patterns that utilizes the chi-square test for independence in the first step and objective measures of interestingness in the second step. We show how this two-step process can be applied to ranking characterized/generalized association rules and data cubes.

knowledge discovery and data mining | 1998

Mining Market Basket Data Using Share Measures and Characterized Itemsets

Robert J. Hilderman; Colin L. Carter; Howard J. Hamilton; Nick Cercone

We propose the share-confidence framework for knowledge discovery from databases which addresses the problem of mining itemsets from market basket data. Our goal is two-fold: (1) to present new itemset measures which are practical and useful alternatives to the commonly used support measure; (2) to not only discover the buying patterns of customers, but also to discover customer profiles by partitioning customers into distinct classes. We present a new algorithm for classifying itemsets based upon characteristic attributes extracted from census or lifestyle data. Our algorithm combines the Apriori algorithm for discovering association rules between items in large databases, and the AOG algorithm for attribute-oriented generalization in large databases. We suggest how characterized itemsets can be generalized according to concept hierarchies associated with the characteristic attributes. Finally, we present experimental results that demonstrate the utility of the share-confidence framework.

european conference on principles of data mining and knowledge discovery | 1997

Parallel Knowledge Discovery Using Domain Generalization Graphs

Robert J. Hilderman; Howard J. Hamilton; Robert J. Kowalchuk; Nick Cercone

Multi-Attribute Generalization is an algorithm for attribute-oriented induction in relational databases using domain generalization graphs. Each node in a domain generalization graph represents a different way of summarizing the domain values associated with an attribute. When generalizing a set of attributes, we show how a serial implementation of the algorithm generates all possible combinations of nodes from the domain generalization graphs associated with the attributes, resulting in the presentation of all possible generalized relations for the set. We then show how the inherent parallelism in domain generalization graphs is exploited by a parallel implementation of the algorithm. Significant speedups were obtained using our approach when large discovery tasks were partitioned across multiple processors. The results of our work enable a database analyst to quickly and efficiently analyze the contents of a relational database from many different perspectives.

conference on tools with artificial intelligence | 2000

Principles for mining summaries using objective measures of interestingness

Robert J. Hilderman; Howard J. Hamilton

An important problem in the area of data mining is the development of effective measures of interestingness for ranking discovered knowledge. The authors propose five principles that any measure must satisfy to be considered useful for ranking the interestingness of summaries generated from databases. We investigate the problem within the context of summarizing a single dataset which can be generalized in many different ways and to many levels of granularity. We perform a comparative sensitivity analysis of fifteen well-known diversity measures to identify those which satisfy the proposed principles. The fifteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. Their use as objective measures of interestingness for ranking summaries generated from databases is novel. The objective of this work is to gain some insight into the behaviour that can be expected from each of the diversity measures in practice, and to begin to develop a theory of interestingness against which the utility of new measures can be assessed.

pacific-asia conference on knowledge discovery and data mining | 1999

Heuristics for Ranking the Interestingness of Discovered Knowledge

Robert J. Hilderman; Howard J. Hamilton

We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. The four interestingness measures presented here are based upon common measures of diversity of a population: variance, the Simpson index, and the Shannon index. Using each of the proposed measures, we assign a single real value to a summary that describes its interestingness. Our experimental results show that the ranks assigned by the four interestingness measures are highly correlated.

international conference on tools with artificial intelligence | 2007

Exploratory Quantitative Contrast Set Mining: A Discretization Approach

Mondelle Simeon; Robert J. Hilderman

In the process of training support vector machines (SVMs) by decomposition methods, working set selection is an important technique, and some exciting schemes were employed into this field. To improve working set selection, we propose a new model for working set selection in sequential minimal optimization (SMO) decomposition methods. In this model, it selects B as working set without reselection. Some properties are given by simple proof, and experiments demonstrate that the proposed method is in general faster than existing methods.

Quality Measures in Data Mining | 2007

Statistical Methodologies for Mining Potentially Interesting Contrast Sets

Robert J. Hilderman; Terry Peckham

One of the fundamental tasks of data analysis in many disciplines is to identify the significant differences between classes or groups. Contrast sets have previously been proposed as a useful tool for describing these differences. A contrast set is a set of association rules for which the antecedents describe distinct groups, a common consequent is shared by all the rules, and support for the rules is significantly different between groups. The intuition is that comparing the support between groups may provide some insight into the fundamental differences between the groups. In this chapter, we compare two contrast set mining methodologies that rely on different statistical philosophies: the well-known STUCCO approach and CIGAR, our proposed alternative approach. Following a brief introduction to general issues and problems related to statistical hypothesis testing in data mining, we survey and discuss the statistical measures underlying the two methods using an informal tutorial approach. Experimental results show that both methodologies are statistically sound, representing valid alternative solutions to the problem of identifying potentially interesting contrast sets.

international symposium on temporal representation and reasoning | 1998

Generalization for calendar attributes using domain generalization graphs

Dee Jay Randall; Howard J. Hamilton; Robert J. Hilderman

The paper addresses the problem of generalizing temporal data based on calendar (date and time) attributes The proposed method is based on a domain generalization graph, i.e., a lattice defining a partial order that represents a set of generalization relations for the attribute. The authors specify the components of a domain generalization graph suited to calendar attributes. They define granularity, subset, lookup, and algorithmic methods for specifying generalizations between calendar domains. To reduce the size of the domain generalization graph used in generalization and the number of results shown to the user, they use six types of pruning: reachability pruning, preliminary manual pruning, data range pruning, previous discard pruning, pregeneralization manual pruning, and post generalization pruning.

Explore More