Christie I. Ezeife
University of Windsor
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christie I. Ezeife.
ACM Computing Surveys | 2010
Nizar R. Mabroukeh; Christie I. Ezeife
Owing to important applications such as mining web page traversal sequences, many algorithms have been introduced in the area of sequential pattern mining over the last decade, most of which have also been modified to support concise representations like closed, maximal, incremental or hierarchical sequences. This article presents a taxonomy of sequential pattern-mining techniques in the literature with web usage mining as an application. This article investigates these algorithms by introducing a taxonomy for classifying sequential pattern-mining algorithms based on important key features supported by the techniques. This classification aims at enhancing understanding of sequential pattern-mining problems, current status of provided solutions, and direction of research in this area. This article also attempts to provide a comparative performance analysis of many of the key techniques and discusses theoretical aspects of the categories in the taxonomy.
Data Mining and Knowledge Discovery | 2005
Christie I. Ezeife; Yi Lu
Sequential mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of events. An important application of sequential mining techniques is web usage mining, for mining web log accesses, where the sequences of web page accesses made by different web users over a period of time, through a server, are recorded. Web access pattern tree (WAP-tree) mining is a sequential pattern mining technique for web log access sequences, which first stores the original web access sequence database on a prefix tree, similar to the frequent pattern tree (FP-tree) for storing non-sequential data. WAP-tree algorithm then, mines the frequent sequences from the WAP-tree by recursively re-constructing intermediate trees, starting with suffix sequences and ending with prefix sequences.This paper proposes a more efficient approach for using the WAP-tree to mine frequent sequences, which totally eliminates the need to engage in numerous re-construction of intermediate WAP-trees during mining. The proposed algorithm builds the frequent header node links of the original WAP-tree in a pre-order fashion and uses the position code of each node to identify the ancestor/descendant relationships between nodes of the tree. It then, finds each frequent sequential pattern, through progressive prefix sequence search, starting with its first prefix subsequence event. Experiments show huge performance gain over the WAP-tree technique.
Distributed and Parallel Databases | 1995
Christie I. Ezeife; Ken Barker
Optimal application performance on a Distributed Object Based System (DOBS) requires class fragmentation and the development of allocation schemes to place fragments at distributed sites so data transfer is minimized. Fragmentation enhances application performance by reducing the amount of irrelevant data accessed and the amount of data transferred unnecessarily between distributed sites. Algorithms for effecting horizontal and vertical fragmentation ofrelations exist, but fragmentation techniques for class objects in a distributed object based system are yet to appear in the literature. This paper first reviews a taxonomy of the fragmentation problem in a distributed object base. The paper then contributes by presenting a comprehensive set of algorithms for horizontally fragmenting the four realizable class models on the taxonomy. The fundamental approach is top-down, where the entity of fragmentation is the class object. Our approach consists of first generating primary horizontal fragments of a class based on only applications accessing this class, and secondly generating derived horizontal fragments of the class arising from primary fragments of its subclasses, its complex attributes (contained classes), and/or its complex methods classes. Finally, we combine the sets of primary and derived fragments of each class to produce the best possible fragments. Thus, these algorithms account for inheritance and class composition hierarchies as well as method nesting among objects, and are shown to be polynomial time.
canadian conference on artificial intelligence | 2002
Christie I. Ezeife; Yue Su
New transaction insertions and old transaction deletions may lead to previously generated association rules no longer being interesting, and new interesting association rules may also appear. Existing association rules maintenance algorithms are Apriori-like, which mostly need to scan the entire database several times in order to update the previously computed frequent or large itemsets, and in particular, when some previous small itemsets become large in the updated database.This paper presents two new algorithms that use the frequent patterns tree (FP-tree) structure to reduce the required number of database scans. One proposed algorithm is the DB-tree algorithm, which stores all the database information in an FP-tree structure and requires no re-scan of the original database for all update cases. The second algorithm is the PotFp-tree (Potential frequent pattern) algorithm, which uses a prediction of future possible frequent itemsets to reduce the number of times the original database needs to be scanned when previous small itemsets become large after database update.
international conference on data mining | 2009
Nizar R. Mabroukeh; Christie I. Ezeife
Domain knowledge for web applications is currently being made available as domain ontology with the advent of the semantic web, in which semantics govern relationships among objects of interest (e. g., commercial items to be purchased in an e-Commerce web site). Our earlier work proposed to integrate semantic information into all phases of the web usage mining process, for an intelligent semantics-aware web usage mining framework. There are ways to integrate semantic information into Markov models used in the third phase for next page request prediction. Semantic information is combined with the transition probability matrix of a Markov model. This way, it provides a low order Markov model with intelligent accurate predictions and less complexity than higher order models, also solving the problem of contradicting prediction. This paper proposes to use semantic information to prune states in Selective Markov models SMM, semantic information can lead to context-aware higher order Markov models with about 16% less space complexity.
conference on information and knowledge management | 2009
Nizar R. Mabroukeh; Christie I. Ezeife
This paper proposes the integration of semantic information drawn from a web applications domain knowledge into all phases of the web usage mining process (preprocessing, pattern discovery, and recommendation/prediction). The goal is to have an intelligent semantics-aware web usage mining framework. This is accomplished by using semantic information in the sequential pattern mining algorithm to prune the search space and partially relieve the algorithm from support counting. In addition, semantic information is used in the prediction phase with low order Markov models, for less space complexity and accurate prediction, that will help ambiguous predictions problem. Experimental results show that semantics-aware sequential pattern mining algorithms can perform 4 times faster than regular non-semantics-aware algorithms with only 26% of the memory requirement.
pacific-asia conference on knowledge discovery and data mining | 2003
Yi Lu; Christie I. Ezeife
Web access pattern tree algorithm mines web log access sequences by first storing the original web access sequence database on a prefix tree (WAP-tree). WAP-tree algorithm then mines frequent sequences from the WAP-tree by recursively re-constructing intermediate WAP-trees, starting with their suffix subsequences. This paper proposes an efficient approach for using the preorder linked WAP-trees with binary position codes assigned to each node, to mine frequent sequences, which eliminates the need to engage in numerous re-construction of intermediate WAP-trees during mining. Experiments show huge performance advantages for sequential mining using prefix linked WAP-tree technique.
International Journal of Data Warehousing and Mining | 2005
Christie I. Ezeife; Timothy E. Ohanekwu
Identifying integrated records that represent the same real-world object in numerous ways is just one form of data disparity (dirt) to be resolved in a data warehouse. Data cleaning is a complex process, which uses multidisciplinary techniques to resolve conflicts in data drawn from different data sources. There is a need for initial cleaning at the time a data warehouse is built, and incremental cleaning whenever new records are brought into the data warehouse during refreshing. Existing work on data cleaning have used pre-specified record match thresholds and multiple scanning of records to determine matching records in integrated data. Little attention has been paid to incremental matching of records. Determining optimal record match score threshold in a domain is hard. Also, direct long record string comparison is highly inefficient and intolerant to typing errors. Thus, this article proposes two algorithms, the first of which uses smart tokens defined from integrated records to match and identify duplicate records during initial warehouse cleaning. The second algorithm uses these tokens for fast, incremental cleaning during warehouse refreshing. Every attribute value forms either a special token like birth date or an ordinary token, which can be alphabetic, numeric, or alphanumeric. Rules are applied for forming tokens belonging to each of these four classes. These tokens are sorted and used for record match. The tokens also form very good warehouse identifiers for future faster incremental warehouse cleaning. This approach eliminates the need for match threshold and multiple passes at data. Experiments show that using tokens for record comparison produces a far better result than using the entire or greater part of a record.
Proceedings of the 1st international workshop on open source data mining | 2005
Christie I. Ezeife; Yi Lu; Yi Liu
PLWAP algorithm uses a preorder linked, position coded version of WAP tree and eliminates the need to recursively re-construct intermediate WAP trees during sequential mining as done by WAP tree technique. PLWAP produces significant reduction in response time achieved by the WAP algorithm and provides a position code mechanism for remembering the stored database, thus, eliminating the need to re-scan the original database as would be necessary for applications like those incrementally maintaining mined frequent patterns, performing stream or dynamic mining.This paper presents open source code for both the PLWAP and WAP algorithms describing our implementations and experimental performance analysis of these two algorithms on synthetic data generated with IBM quest data generator. An implementation of the Apriori-like GSP sequential mining algorithm is also discussed and submitted. A web log pre-processor for producing real input to the algorithms is made available too.
data and knowledge engineering | 2001
Christie I. Ezeife
Data warehouse views typically store large aggregate tables based on a subset of dimension attributes of the main data warehouse fact table. Aggregate views can be stored as 2 n subviews of a data cube with n attributes. Methods have been proposed for selecting only some of the data cube views to materialize in order to speed up query response time, accommodate storage space constraint and reduce warehouse maintenance cost. This paper proposes a method for selecting and materializing views, which selects and horizontally fragments a view, recomputes the size of the stored partitioned view while deciding further views to select. ” 2001 Elsevier Science B.V. All rights reserved.