John G. Hughes
Ulster University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John G. Hughes.
data and knowledge engineering | 1996
Sarabjot Singh Anand; David A. Bell; John G. Hughes
Data Mining or Knowledge Discovery in Databases is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work being done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM — Evidence-based Data Mining — a general framework for Data Mining based on Evidence Theory. Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discoveryd by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values. The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets — a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it ‘general’ in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory. The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery. We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes. The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.
conference on information and knowledge management | 1995
Sarabjot Singh Anand; David A. Bell; John G. Hughes
The ideal situation for a Data Mining or Knowledge Discovery system would be for the user to be able to pose a query of the form “Give me something interesting that could be useful” and for the system to discover some useful knowledge for the user. But such a system would be unrealistic as databases in the real world are very large and so it would be too inefficient to be workable. So the role of the human within the discovery process is essential. Moreover, the measure of what is meant by “interesting to the user” is dependent on the user as well as the domain within which the Data Mining system is being used. In this paper we discuss the use of domain knowledge within Data Mining. We define three classes of domain knowledge: Hierarchical Generalization Trees ( HG-Trees), Attribute Relationship Rules (AR-rules) and EnvironmentBased Constraints (EBC). We discuss how each one of these types of domain knowledge is incorporated into the discovery process within the EDM (Evidential Data Mining) framework for Data Mining proposed earlier by the authors [ANAN94], and in particular within the STRIP (Strong Rule Induction in Parallel) algorithm [ANAN95] implemented within the EDM framework. We highlight the advantages of using domain knowledge within the discovery process by providing results from the application of the STRIP algorithm in the actuarial domain.
Knowledge Based Systems | 1998
Sarabjot Singh Anand; A. R. Patrick; John G. Hughes; David A. Bell
In this paper we discuss the use of Data Mining to provide a solution to the problem of cross-sales. We define and analyse the cross-sales problem and develop a hybrid methodology to solve it, using characteristic rule discovery and deviation detection. Deviation detection is used as a measure of interest to filter out the less interesting characteristic rules and only retain the best characteristic rules discovered. The effect of domain knowledge on the interestingness value of the discovered rules is discussed and techniques for refining the knowledge to increase this interestingness measure are studied. We also investigate the use of externally procured lifestyle and other survey data for data enrichment and discuss its use as additional domain knowledge. The developed methodology has been applied to a real world cross-sales problem within the financial sector, and the results are also presented in this paper. Although the application described is in the financial sector, the methodology is generic in nature and can be applied to other sectors.
IEEE Intelligent Systems | 1997
Sarabjot Singh Anand; Bryan W. Scotney; Mee G. Tan; Sally I. McClean; David A. Bell; John G. Hughes; Ian C. Magill
The Mining Kernel System provides a foundation for building data-mining tools that are capable of tackling complex knowledge discovery problems. Examples from applications involving intelligent computerized support for a urology clinic and improved customer database utilization in financial settings illustrate its effectiveness.
ACM Transactions on Internet Technology | 2004
Jianhan Zhu; Jun Hong; John G. Hughes
User traversals on hyperlinks between Web pages can reveal semantic relationships between these pages. We use user traversals on hyperlinks as weights to measure semantic relationships between Web pages. On the basis of these weights, we propose a novel method to put Web pages on a Web site onto different conceptual levels in a link hierarchy. We develop a clustering algorithm called PageCluster, which clusters conceptually-related pages on each conceptual level of the link hierarchy based on their in-link and out-link similarities. Clusters are then used to construct a conceptual link hierarchy, which is visualized in a prototype called Online Navigation Explorer (ONE) for adaptive Web site navigation. Our experiments show that our method can put Web pages onto conceptual levels of a link hierarchy more accurately than both the breadth-first search method and the shortest-weighted-path method, and PageCluster can cluster conceptually-related pages more accurately than the bibliographic analysis method. Our user study also shows that the conceptual link hierarchy visualized in ONE can help users find information more effectively and efficiently as the task of finding information becomes less specific and involves more Web pages on multiple conceptual levels.
Artificial Intelligence in Medicine | 1999
Sarabjot Singh Anand; Ann E. Smith; Peter Hamilton; J. S. Anand; John G. Hughes; Bartels Ph
In this paper we describe attempts at building a robust model for predicting the length of survival of patients with colorectal cancer. The aim of the research, reported in this paper, is to study the effective utilisation of artificial intelligence techniques in the medical domain. We suggest that an important research objective of proponents of intelligent prognostic systems must be to evaluate the additionality that AI techniques can bring to an already well-established field of medical prognosis. Towards this end, we compare a number of different AI techniques that lend themselves to the task of predicting survival in colorectal cancer patients. We describe the pros and cons of each of these methods using the usual metrics of accuracy and perspicuity. We then present the notion of intelligent hybrid systems and evaluate the role that they may potentially play in developing robust prognostic models. In particular we evaluate a hybrid system that utilises the k Nearest Neighbour technique in conjunction with Genetic Algorithms. We describe a number of innovations used within this hybrid paradigm used to build the prognostic model. We discuss the issue of censored patients and how this issue can be tackled within the various models used. In keeping with our objective of studying the additionality that AI techniques bring to building prognostic models, we use Coxs regression as a standard and compare each AI technique with it, attempting to discover their capabilities in enhancing prognostic methods in medicine. In doing so we address two main questions--which model fits the data best?, and are the results obtained by the various AI techniques significantly different from those of Coxs regression? We conclude this paper by discussing future enhancements to the work presented and lessons learned from the study to date.
knowledge discovery and data mining | 1998
Sarabjot Singh Anand; W. R. David Patterson; John G. Hughes; David A. Bell
The use of Data Mining in removing current bottlenecks within Case-based Reasoning (CBR) systems is investigated along with the possible role of CBR in providing a knowledge management back-end to current Data Mining systems. In particular, this paper discusses the use of Data Mining in two aspects of the MZ system [ANAN97a], namely, the acquisition of cases and discovery of adaptation knowledge. We discuss, in detail, the approach taken to discover cases and outline the methodology to discover adaptation knowledge. For case discovery, a Kohonen network is used to identify initial clusters within the database. These clusters are then analysed using C4.5 and non-unique clusters are grouped to form concepts. A regression tree induction algorithm is then used to ensure that the concepts are rich in information required to predict the dependent variable in the data set. Cases are then chosen from each of the identified concepts as well as outliers in the database. Initial results obtained in the acquisition of cases are presented and analysed. They indicate that the proposed approach achieves a high reduction in the size of the case base.
data and knowledge engineering | 1999
Werner Dubitzky; Alex G. Büchner; John G. Hughes; David A. Bell
Case-based reasoning (CBR) systems define knowledge in terms of a memory or library of past cases and a retrieval mechanism that revolves around retrieving data relevant to a goal query. Additionally, such systems employ an adaptation component that transforms the retrieved data into a solution to the problem expressed by the original query. The combination of goal query and the subsequent solution transformation is referred to as CBR goal query. Goal queries are concerned with data that is close to the request expressed in the query. Conventional relational and object-oriented databases are usually concerned specific queries. Extending conventional object-oriented data models, this paper proposes a concept-oriented data model that provides a variety of mechanisms to support conventional goal and CBR goal queries. It is shown that such a concept-oriented data model could be used as the core for a more general knowledge base management system.
Knowledge and Information Systems | 1999
David W. Patterson; Sarabjot Singh Anand; Werner Dubitzky; John G. Hughes
In this paper we present the M2 Case-Based Reasoning (CBR) system. The M2 system addresses a number of issues that present methodologies for CBR systems have shied away from. We discuss techniques for removing the knowledge acquisition bottleneck when acquiring case knowledge. Here, case knowledge refers to the complementary knowledge structures, cases (more specific in nature) and adaptation rules (more general). We address the use of negative cases for updating the case knowledge as well as for refining the similarity measures. In particular we discuss in detail, showing experimental results, the use of Data Mining within the M2 system to build the case base from a database containing operational data, and discover adaptation rules. A methodology to monitor the competence of the CBR system and to utilize negative cases for updating the CBR system to enhance its competence is also discussed. The M2 CBR system also employs Rough Set and Fuzzy Set theories to further enhance its capabilities within real-world applications as well as providing a richer and truer model of human reasoning.
uncertainty in artificial intelligence | 1992
Weiru Liu; John G. Hughes; Michael F. McTear
The Dempster-Shafer theory of evidence has been used intensively to deal with uncertainty in knowledge-based systems. However the representation of uncertain relationships between evidence and hypothesis groups (heuristic knowledge) is still a major research problem. This paper presents an approach to representing such heuristic knowledge by evidential mappings which are defined on the basis of mass functions. The relationships between evidential mappings and multivalued mappings, as well as between evidential mappings and Bayesian multi- valued causal link models in Bayesian theory are discussed. Following this the detailed procedures for constructing evidential mappings for any set of heuristic rules are introduced. Several situations of belief propagation are discussed.