Sarabjot Singh Anand | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sarabjot Singh Anand is active.

Explore More

Publication

Featured researches published by Sarabjot Singh Anand.

data and knowledge engineering | 1996

EDM : A general framework for data mining based on evidence theory

Sarabjot Singh Anand; David A. Bell; John G. Hughes

Data Mining or Knowledge Discovery in Databases is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work being done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM — Evidence-based Data Mining — a general framework for Data Mining based on Evidence Theory. Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discoveryd by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values. The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets — a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it ‘general’ in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory. The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery. We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes. The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.

From Web to Social Web: Discovering and Deploying User and Content Profiles | 2007

Contextual Recommendation

Sarabjot Singh Anand; Bamshad Mobasher

The role of context in our daily interaction with our environment has been studied in psychology, linguistics, artificial intelligence, information retrieval, and more recently, in pervasive/ubiquitous computing. However, context has been largely ignored in research into recommender systems specifically and personalization in general. In this paper we describe how context can be brought to bear on recommender systems. As a means for achieving this, we propose a fundamental shift in terms of how we model a user within a recommendation system: inspired by models of human memory developed in psychology, we distinguish between a users short term and long term memories, define a recommendation process that uses these two memories, using context-based retrieval cues to retrieve relevant preference information from long term memory and use it in conjunction with the information stored in short term memory for generating recommendations. We also describe implementations of recommender systems and personalization solutions based on this framework and show how this results in an increase in recommendation quality.

ACM Transactions on Internet Technology | 2007

Generating semantically enriched user profiles for Web personalization

Sarabjot Singh Anand; Patricia Kearney; Mary Shapcott

Traditional collaborative filtering generates recommendations for the active user based solely on ratings of items by other users. However, most businesses today have item ontologies that provide a useful source of content descriptors that can be used to enhance the quality of recommendations generated. In this article, we present a novel approach to integrating user rating vectors with an item ontology to generate recommendations. The approach is novel in measuring similarity between users in that it first derives factors, referred to as impacts, driving the observed user behavior and then uses these factors within the similarity computation. In doing so, a more comprehensive user model is learned that is sensitive to the context of the user visit. An evaluation of our recommendation algorithm was carried out using data from an online retailer of movies with over 94,000 movies, 44,000 actors, and 10,000 directors within the item knowledge base. The evaluation showed a statistically significant improvement in the prediction accuracy over traditional collaborative filtering. Additionally, the algorithm was shown to generate recommendations for visitors that belong to sparse sections of the user space, areas where traditional collaborative filtering would generally fail to generate accurate recommendations.

conference on information and knowledge management | 1995

The role of domain knowledge in data mining

Sarabjot Singh Anand; David A. Bell; John G. Hughes

The ideal situation for a Data Mining or Knowledge Discovery system would be for the user to be able to pose a query of the form “Give me something interesting that could be useful” and for the system to discover some useful knowledge for the user. But such a system would be unrealistic as databases in the real world are very large and so it would be too inefficient to be workable. So the role of the human within the discovery process is essential. Moreover, the measure of what is meant by “interesting to the user” is dependent on the user as well as the domain within which the Data Mining system is being used. In this paper we discuss the use of domain knowledge within Data Mining. We define three classes of domain knowledge: Hierarchical Generalization Trees ( HG-Trees), Attribute Relationship Rules (AR-rules) and EnvironmentBased Constraints (EBC). We discuss how each one of these types of domain knowledge is incorporated into the discovery process within the EDM (Evidential Data Mining) framework for Data Mining proposed earlier by the authors [ANAN94], and in particular within the STRIP (Strong Rule Induction in Parallel) algorithm [ANAN95] implemented within the EDM framework. We highlight the advantages of using domain knowledge within the discovery process by providing results from the application of the STRIP algorithm in the actuarial domain.

Knowledge Based Systems | 1998

A data mining methodology for cross-sales

Sarabjot Singh Anand; A. R. Patrick; John G. Hughes; David A. Bell

In this paper we discuss the use of Data Mining to provide a solution to the problem of cross-sales. We define and analyse the cross-sales problem and develop a hybrid methodology to solve it, using characteristic rule discovery and deviation detection. Deviation detection is used as a measure of interest to filter out the less interesting characteristic rules and only retain the best characteristic rules discovered. The effect of domain knowledge on the interestingness value of the discovered rules is discussed and techniques for refining the knowledge to increase this interestingness measure are studied. We also investigate the use of externally procured lifestyle and other survey data for data enrichment and discuss its use as additional domain knowledge. The developed methodology has been applied to a real world cross-sales problem within the financial sector, and the results are also presented in this paper. Although the application described is in the financial sector, the methodology is generic in nature and can be applied to other sectors.

Journal of Property Investment & Finance | 1999

The Application of Intelligent Hybrid Techniques for the Mass Appraisal of Residential Properties

William McCluskey; Sarabjot Singh Anand

Hybrid systems as the next generation of intelligent applications within the field of mass appraisal and valuation are investigated. Motivated by the obvious limitations of paradigms that are being used in isolation or as stand‐alone techniques such as multiple regression analysis, artificial neural networks and expert systems. Clearly, there are distinct advantages in integrating two or more information processing systems that would address some of the discrete problems of individual techniques. Examines first, the strategic development of mass appraisal approaches which have traditionally been based on “stand‐alone” techniques; second, the potential application of an intelligent hybrid system. Highlights possible solutions by investigating various hybrid systems that may be developed incorporating a nearest neighbour algorithm (k‐NN). The enhancements are aimed at two major deficiencies in traditional distance metrics; user dependence for attribute weights and biases in the distance metric towards matching categorical variables in the retrieval of neighbours. Solutions include statistical techniques: mean, coefficient of variation and significant mean. Data mining paradigms based on a loosely coupled neural network or alternatively a tight coupling with genetic algorithms are used to discover attribute weights. The hybrid architectures developed are applied to a property data set and their performance measured based on their predictive value as well as perspicuity. Concludes by considering the application and the relevance of these techniques within the field of computer assisted mass appraisal.

IEEE Intelligent Systems | 1997

Designing a kernel for data mining

Sarabjot Singh Anand; Bryan W. Scotney; Mee G. Tan; Sally I. McClean; David A. Bell; John G. Hughes; Ian C. Magill

The Mining Kernel System provides a foundation for building data-mining tools that are capable of tackling complex knowledge discovery problems. Examples from applications involving intelligent computerized support for a urology clinic and improved customer database utilization in financial settings illustrate its effectiveness.

ACM Transactions on Internet Technology | 2007

Introduction to intelligent techniques for Web personalization

Sarabjot Singh Anand; Bamshad Mobasher

Web personalization [Anand and Mobasher 2005] can be defined as any set of actions that can tailor the Web experience to a particular user or set of users. The experience can be something as casual as browsing a Web site or as (economically) significant as trading stocks or purchasing a car. The actions can range from simply making the presentation more pleasing to anticipating the needs of a user and providing customized and relevant information. To achieve effective personalization, organizations must rely on all available data, including the usage and click-stream data (reflecting user behavior), site content, site structure, and domain knowledge, as well as user demographics and profiles. Efficient and intelligent techniques are needed to mine this data for actionable knowledge, and to effectively use the discovered knowledge to enhance the users’ Web experience. Being data driven; Web personalization is also achieved through the implementation of all the phases of a typical data mining cycle [Mobasher 2007] including data collection, preprocessing, pattern discovery and evaluation, in an off-line mode, and finally the deployment of the knowledge in realtime to mediate between the user and the Web. The data collection and preprocessing phases involve intelligently extracting and integrating useful data from multiple sources and extracting user preferences implicit within the data. The pattern discovery phase usually involves the adaptation and integration of techniques from machine learning, information retrieval and filtering, databases, agent architectures, knowledge representation, data mining, text mining, statistics, information security and privacy, and context modeling with the goal of building single or group user models. These techniques must address

Artificial Intelligence in Medicine | 1999

An evaluation of intelligent prognostic systems for colorectal cancer.

Sarabjot Singh Anand; Ann E. Smith; Peter Hamilton; J. S. Anand; John G. Hughes; Bartels Ph

In this paper we describe attempts at building a robust model for predicting the length of survival of patients with colorectal cancer. The aim of the research, reported in this paper, is to study the effective utilisation of artificial intelligence techniques in the medical domain. We suggest that an important research objective of proponents of intelligent prognostic systems must be to evaluate the additionality that AI techniques can bring to an already well-established field of medical prognosis. Towards this end, we compare a number of different AI techniques that lend themselves to the task of predicting survival in colorectal cancer patients. We describe the pros and cons of each of these methods using the usual metrics of accuracy and perspicuity. We then present the notion of intelligent hybrid systems and evaluate the role that they may potentially play in developing robust prognostic models. In particular we evaluate a hybrid system that utilises the k Nearest Neighbour technique in conjunction with Genetic Algorithms. We describe a number of innovations used within this hybrid paradigm used to build the prognostic model. We discuss the issue of censored patients and how this issue can be tackled within the various models used. In keeping with our objective of studying the additionality that AI techniques bring to building prognostic models, we use Coxs regression as a standard and compare each AI technique with it, attempting to discover their capabilities in enhancing prognostic methods in medicine. In doing so we address two main questions--which model fits the data best?, and are the results obtained by the various AI techniques significantly different from those of Coxs regression? We conclude this paper by discussing future enhancements to the work presented and lessons learned from the study to date.

web information systems engineering | 2000

Data mining and XML: current and future issues

Alex G. Büchner; Matthias Baumgarten; Maurice Mulvenna; R. Böhm; Sarabjot Singh Anand

This paper describes potential synergies between data mining and XML, which include the representation of discovered data mining knowledge, knowledge discovery from XML documents, XML-based data preparation and XML-based domain knowledge. Each category is viewed from a theoretical as well as a practical point of view.

Explore More