Fedja Hadzic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fedja Hadzic is active.

Explore More

Publication

Featured researches published by Fedja Hadzic.

knowledge discovery and data mining | 2006

IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

Henry Tan; Tharam S. Dillon; Fedja Hadzic; Elizabeth Chang; Ling Feng

Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle the complexity of mining embedded subtrees by utilizing a novel Embedding List representation, Tree Model Guided enumeration, and introducing the Level of Embedding constraint. Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually up to 1, from which all the obtained frequent subtrees are induced subtrees. Our experiments with both synthetic and real datasets against two known algorithms for mining induced and embedded subtrees, FREQT and TreeMiner, demonstrate the effectiveness and the efficiency of the technique.

Knowledge Based Systems | 2011

Interestingness measures for association rules based on statistical validity

Izwan Nizal Mohd Shaharanee; Fedja Hadzic; Tharam S. Dillon

Assessing rules with interestingness measures is the pillar of successful application of association rules discovery. However, association rules discovered are normally large in number, some of which are not considered as interesting or significant for the application at hand. In this paper, we present a systematic approach to ascertain the discovered rules, and provide a precise statistical approach supporting this framework. The proposed strategy combines data mining and statistical measurement techniques, including redundancy analysis, sampling and multivariate statistical analysis, to discard the non- significant rules. Moreover, we consider real world datasets which are characterized by the uniform and non-uniform data/items distribution with a mixture of measurement levels throughout the data/items. The proposed unified framework is applied on these datasets to demonstrate its effectiveness in discarding many of the redundant or non-significant rules, while still preserving the high accuracy of the rule set as a whole.

ACM Transactions on Knowledge Discovery From Data | 2008

Tree model guided candidate generation for mining frequent subtrees from XML documents

Henry Tan; Fedja Hadzic; Tharam S. Dillon; Elizabeth Chang; Ling Feng

Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation of our Tree Model Guided (TMG) candidate generation. TMG is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.

computational intelligence and data mining | 2007

UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation

Fedja Hadzic; Henry Tan; Tharam S. Dillon

Semi-structured data sources are increasingly in use today because of their capability of representing information through more complex structures where semantics and relationships of data objects are more easily expressed. Extraction of frequent sub-structures from such data has found important applications in areas such as Bioinformatics, XML mining, Web mining, scientific data management etc. This paper is concerned with the task of mining frequent unordered induced subtrees from a database of rooted ordered labeled subtrees. Our previous work in the area of frequent subtree mining is characterized by the efficient tree model guided (TMG) candidate enumeration, where candidate subtrees conform to the datas underlying tree structure. We apply the same approach to the unordered case, motivated by the fact that in many applications of frequent subtree mining the order among siblings is not considered important. The proposed UNI3 algorithm considers both transaction based and occurrence match support. Synthetic and real world data are used to evaluate the time performance of our approach in comparison to the well known algorithms developed for the same problem

Archive | 2011

Mining of Data with Complex Structures

Fedja Hadzic; Henry Tan; Tharam S. Dillon

Mining of Data with Complex Structures:- Clarifies the type and nature of data with complex structure including sequences, trees and graphs- Provides a detailed background of the state-of-the-art of sequence mining, tree mining and graph mining.-Defines the essential aspects of the tree mining problem: subtree types, support definitions, constraints.- Outlines the implementation issues one needs to consider when developing tree mining algorithms (enumeration strategies, data structures, etc.)- Details the Tree Model Guided (TMG) approach for tree mining and provides the mathematical model for the worst case estimate of complexity of mining ordered induced and embedded subtrees.- Explains the mechanism of the TMG framework for mining ordered/unordered induced/embedded and distance-constrained embedded subtrees.- Provides a detailed comparison of the different tree mining approaches highlighting the characteristics and benefits of each approach.- Overviews the implications and potential applications of tree mining in general knowledge management related tasks, and uses Web, health and bioinformatics related applications as case studies.- Details the extension of the TMG framework for sequence mining- Provides an overview of the future research direction with respect to technical extensions and application areasThe primary audience is 3rd year, 4th year undergraduate students, Masters and PhD students and academics. The book can be used for both teaching and research. The secondary audiences are practitioners in industry, business, commerce, government and consortiums, alliances and partnerships to learn how to introduce and efficiently make use of the techniques for mining of data with complex structures into their applications. The scope of the book is both theoretical and practical and as such it will reach a broad market both within academia and industry. In addition, its subject matter is a rapidly emerging field that is critical for efficient analysis of knowledge stored in various domains.

international conference on data mining | 2006

Razor: mining distance-constrained embedded subtrees

Henry Tan; Tharam S. Dillon; Fedja Hadzic; Elizabeth Chang

Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labeled subtrees. Previously we have developed an efficient algorithm, MB3 (Tan et al., 2005), for mining frequent embedded subtrees from a database of rooted labeled and ordered subtrees. The efficiency comes from the utilization of a novel embedding list representation for tree model guided (TMG) candidate generation. As an extension the IMB3 (Tan et al., 2006) algorithm introduces the level of embedding constraint. In this study we extend our past work by developing an algorithm, Razor, for mining embedded subtrees where the distance of nodes relative to the root of the subtree needs to be considered. This notion of distance constrained embedded tree mining will have important applications in Web information systems, conceptual model analysis and more sophisticated ontology matching. Domains representing their knowledge in a tree structured form may require this additional distance information as it commonly indicates the amount of specific knowledge stored about a particular concept within the hierarchy. The structure based approaches for schema matching commonly take the distance among the concept nodes within a sub-structure into account when evaluating the concept similarity across different schemas. We present an encoding strategy to efficiently enumerate candidate subtrees taking the distance of nodes relative to the root of the subtree into account. The algorithm is applied to both synthetic and real-world datasets, and the experimental results demonstrate the correctness and effectiveness of the proposed technique

pacific-asia conference on knowledge discovery and data mining | 2011

A structure preserving flat data format representation for tree-structured data

Fedja Hadzic

Mining of semi-structured data such as XML is a popular research topic due to many useful applications. The initial work focused mainly on values associated with tags, while most of recent developments focus on discovering association rules among tree structured data objects to preserve the structural information. Other data mining techniques have had limited use in tree-structured data analysis as they were mainly designed to process flat data format with no need to capture the structural properties of data objects. This paper proposes a novel structure-preserving way for representing tree-structured document instances as records in a standard flat data structure to enable applicability of a wider range of data analysis techniques. The experiments using synthetic and real world data demonstrate the effectiveness of the proposed approach.

web intelligence | 2008

U3 - Mning Unordered Embedded Subtrees Using TMG Candidate Generation

Fedja Hadzic; Henry Tan; Tharam S. Dillon

In this paper we present an algorithm for mining of unordered embedded subtrees. This is an important problem for association rule mining from semi-structured documents, and it has important applications in many biomedical, Web and scientific domains. The proposed U3 algorithm is an extension of our general tree model guided (TMG) candidate generation framework and it considers both transaction based and occurrence match support. Synthetic and real world data sets are used to experimentally demonstrate the efficiency of our approach to the problem, and the flexibility of our general TMG framework.

international conference on data mining | 2006

SEQUEST: Mining frequent subsequences using DMA strips

Henry Tan; Tharam S. Dillon; Fedja Hadzic; Elizabeth Chang

Sequential patterns exist in data such as DNA string databases, occurrences of recurrent illness, etc. In this study, we present an algorithm, SEQUEST, to mine frequent subsequences from sequential patterns. The challenges of mining a very large database of sequences is computationally expensive and require large memory space. SEQUEST uses a Direct Memory Access Strips (DMA-Strips) structure to efficiently generate candidate subsequences. DMA-Strips structure provides direct access to each item to be manipulated and thus is optimized for speed and space performance. In addition, the proposed technique uses a hybrid principle of frequency counting by the vertical join approach and candidate generation by structure guided method. The structure guided method is adapted from the TMG approach used for enumerating subtrees in our previous work [8]. Experiments utilizing very large databases of sequences which compare our technique with the existing technique, PLWAP [4], demonstrate the effectiveness of our proposed technique.

hawaii international conference on system sciences | 2008

Tree Mining in Mental Health Domain

Maja Hadzic; Fedja Hadzic; Tharam S. Dillon

The number of mentally ill people is increasing globally each year. Despite major medical advances, the identification of genetic and environmental factors responsible for mental illnesses still remains unsolved and is therefore a very active research focus today. Semi-structured data structure is predominantly used to enable the meaningful representations of the available mental health knowledge. Data mining techniques can be used to efficiently analyze these semi-structured mental health data. Tree mining algorithms can efficiently extract frequent substructures from semi-structured knowledge representation such as XML. In this paper we demonstrate effective application of the tree mining algorithms on records of mentally ill patients. The extracted data patterns can provide useful information to help in prevention of mental illness and assist in delivery of effective and efficient mental health services.

Explore More