Mohammad El-Hajj | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohammad El-Hajj is active.

Explore More

Publication

Featured researches published by Mohammad El-Hajj.

international conference on data mining | 2001

Fast parallel association rule mining without candidacy generation

Osmar R. Zaïane; Mohammad El-Hajj; Paul Lu

In this paper we introduce a new parallel algorithm MLFPT (multiple local frequent pattern tree) for parallel mining of frequent patterns, based on FP-growth mining, that uses only two full I/O scans of the database, eliminating the need for generating candidate items, and distributing the work fairly among processors. We have devised partitioning strategies at different stages of the mining process to achieve near optimal balancing between processors. We have successfully tested our algorithm on datasets larger than 50 million transactions.

knowledge discovery and data mining | 2003

Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining

Mohammad El-Hajj; Osmar R. Zaïane

Existing association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is the high memory dependency: either the gigantic data structure built is assumed to fit in main memory, or the recursive mining process is too voracious in memory resources. Another major impediment is the repetitive and interactive nature of any knowledge discovery process. To tune parameters, many runs of the same algorithms are necessary leading to the building of these huge data structures time and again. This paper proposes a new disk-based association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding frequent patterns could be achieved in less than a full scan with random access. Second, for each frequent item, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed. Experimental studies reveal that our Inverted Matrix approach outperform FP-Tree especially in mining very large transactional databases with a very large number of unique items. Our random access disk-based approach is particularly advantageous in a repetitive and interactive setting.

international conference on parallel and distributed systems | 2006

Parallel leap: large-scale maximal pattern mining in a distributed environment

Mohammad El-Hajj; Osmar R. Zaïane

When computationally feasible, mining extremely large databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have been suggested such as sequential mining for maximal patterns or searching for all frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets. In this work we propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of maximal patterns which, to the best of our knowledge, has never been proposed efficiently before. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than 2 hours

data warehousing and knowledge discovery | 2003

Non recursive generation of frequent K-itemsets from frequent pattern tree representations

Mohammad El-Hajj; Osmar R. Zaïane

Existing association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is the high memory dependency: gigantic data structures built are assumed to fit in main memory; in addition, the recursive mining process to mine these structures is also too voracious in memory resources. This paper proposes a new association rule-mining algorithm based on frequent pattern tree data structure. Our algorithm does not use much more memory over and above the memory used by the data structure. For each frequent item, a relatively small independent tree called COFI-tree, is built summarizing co-occurrences. Finally, a simple and non-recursive mining process mines the COFI-trees. Experimental studies reveal that our approach is efficient and allows the mining of larger datasets than those limited by FP-Tree

international conference on management of data | 2004

COFI approach for mining frequent itemsets revisited

Mohammad El-Hajj; Osmar R. Zaïane

The COFI approach for mining frequent itemsets, introduced recently, is an efficient algorithm that was demonstrated to outperform state-of-the-art algorithms on synthetic data. For instance, COFI is not only one order of magnitude faster and requires significantly less memory than the popular FP-Growth, it is also very effective with extremely large datasets, better than any reported algorithm. However, COFI has a significant drawback when mining dense transactional databases which is the case with some real datasets. The algorithm performs poorly in these cases because it ends up generating too many local candidates that are doomed to be infrequent. In this paper, we present a new algorithm COFI* for mining frequent itemsets. This novel algorithm uses the same data structure COFI-tree as its predecessor, but partitions the patterns in such a way to avoid the drawbacks of COFI. Moreover, its approach uses a pseudo-Oracle to pinpoint the maximal itemsets, from which all frequent itemsets are derived and counted, avoiding the generation of candidates fated infrequent. Our implementation tested on real and synthetic data shows that COFI* algorithm outperforms state-of-the-art algorithms, among them COFI itself.

database and expert systems applications | 2003

Parallel association rule mining with minimum inter-processor communication

Mohammad El-Hajj; Osmar R. Zaïane

Existing parallel association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is that most of the parallel algorithms for a shared nothing environment are Apriori-based algorithms. Apriori-based algorithms are proven to be not scalable due to many reasons, mainly: (1) the repetitive I/O disk scans, (2) the huge computation and communication involved during the candidacy generation. This paper proposes a new disk-based parallel association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding globally frequent patterns could be achieved in less than a full scan with random access. This data structure is replicated among the parallel nodes. Second, for each frequent item assigned to a parallel node, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed, and no communication between nodes is required to generate all globally frequent patterns.

international conference on data mining | 2005

Bifold constraint-based mining by simultaneous monotone and anti-monotone checking

Mohammad El-Hajj; Osmar R. Zaïane; Paul Nalos

Mining for frequent item sets can generate an overwhelming number of patterns, often exceeding the size of the original transactional database. One way to deal with this issue is to set filters and interestingness measures. Others advocate the use of constraints to apply to the patterns, either on the form of the patterns or on descriptors of the items in the patterns. However, typically the filtering of patterns based on these constraints is done as a post-processing phase. Filtering the patterns post-mining adds a significant overhead, still suffers from the sheer size of the pattern set and loses the opportunity to exploit those constraints. In this paper we propose an approach that allows the efficient mining of frequent item sets patterns, while pushing simultaneously both monotone and anti-monotone constraints during and at different strategic stages of the mining process. Our implementation shows a significant improvement when considering the constraints early and a better performance over Dualminer which also considers both types of constraints.

knowledge discovery and data mining | 2005

Pattern lattice traversal by selective jumps

Osmar R. Zaïane; Mohammad El-Hajj

Regardless of the frequent patterns to discover, either the full frequent patterns or the condensed ones, either closed or maximal, the strategy always includes the traversal of the lattice of candidate patterns. We study the existing depth versus breadth traversal approaches for generating candidate patterns and propose in this paper a new traversal approach that jumps in the search space among only promising nodes. Our leaping approach avoids nodes that would not participate in the answer set and reduce drastically the number of candidate patterns. We use this approach to efficiently pinpoint maximal patterns at the border of the frequent patterns in the lattice and collect enough information in the process to generate all subsequent patterns.

international conference on e-health networking, applications and services | 2013

Unsupervised graph-based Word Sense Disambiguation of biomedical documents

Wessam Gad El-Rab; Osmar R. Zaïane; Mohammad El-Hajj

Word Sense Disambiguation is the task of automatically identifying the correct sense of an ambiguous word. Biomedical documents, similar to other narrative documents, suffer from ambiguity, which impacts the ability to automatically extract knowledge contained in the document text. In this study, we propose a graph-based word sense disambiguation algorithm focused on biomedical text. The proposed algorithm uses the UMLS Metathesaurus as its source of knowledge. Evaluation is carried out using the MSH-WSD data set.

advances in social networks analysis and mining | 2013

Biomedical text disambiguation using UMLS

Wessam Gad El-Rab; Osmar R. Zaïane; Mohammad El-Hajj

Interest in extracting information from biomedical documents has increased significantly in recent years but has always been challenged by the ambiguity of natural language. An important source of ambiguity is the usage of polysemous words: words with multiple meanings. Word sense disambiguation algorithms attempt to solve this problem by finding the correct meaning of a polysemous word in a given context, but very few algorithms were designed to disambiguate biomedical text. In this study we propose a word sense disambiguation algorithm focused on biomedical text. The proposed algorithm does not need to be trained and uses a relatively small knowledge base.

Explore More

Collaboration

Dive into the Mohammad El-Hajj's collaboration.

Top Co-Authors

Osmar R. Zaïane

University of Alberta

View shared research outputs

Top Co-Authors

Wessam Gad El-Rab

University of Alberta

View shared research outputs

Top Co-Authors

University of Alberta

View shared research outputs

Top Co-Authors

University of Alberta

View shared research outputs

Top Co-Authors

University of Alberta

View shared research outputs

Top Co-Authors

University of Alberta

View shared research outputs

Top Co-Authors

University of Alberta

View shared research outputs

Top Co-Authors

University of Alberta

View shared research outputs

Explore More