Tarek Hamrouni
Tunis El Manar University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tarek Hamrouni.
Engineering Applications of Artificial Intelligence | 2016
Tarek Hamrouni; Sarra Slimani; F. Ben Charrada
Mining grid data is an interesting research field which aims at analyzing grid systems with data mining techniques in order to efficiently discover new meaningful knowledge to enhance grid management. In this paper, we focus particularly on how extracted knowledge enables enhancing data replication and replica selection strategies which are important data management techniques commonly used in data grids. Indeed, relevant knowledge such as file access patterns, file correlations, user or job access behavior, prediction of future behavior or network performance, and so on, can be efficiently discovered. These findings are then used to enhance both data replication and replica selection strategies. Various works in this respect are then discussed along with their merits and demerits. In addition, we propose a new guideline to data mining application in the context of data replication and replica selection strategies.
international conference on conceptual structures | 2015
Tarek Hamrouni; Sarra Slimani; Faouzi Ben Charrada
Abstract Replication is one common way to effectively address challenges for improving the data management in data grids. It has attracted a lot of work and many replication strategies have therefore been proposed. Most of these strategies consider a single file-based granularity and do not take into account file access patterns or possible file correlations. However, file correlations become an increasingly important consideration for performance enhancement in data grids. In this regard, the knowledge about file correlations can be extracted from historical and operational data using the techniques of the data mining field. Data mining techniques have proved to offer a powerful tool facilitating the extraction of meaningful knowledge from large data sets. As a consequence of the convergence of data mining and data grid, mining grid data is an interesting research field which aims at analyzing grid systems with data mining techniques in order to efficiently discover new meaningful knowledge to enhance data management in data grids. More precisely, in this paper, the extracted knowledge is used to enhance replica management. Gaps in the current literature and opportunities for further research are presented. In addition, we propose a new guideline to data mining application in the context of data grid replication strategies. To the best of our knowledge, this is the first survey mainly dedicated to data grid replication strategies based on data mining techniques.
Journal of Network and Computer Applications | 2015
Tarek Hamrouni; C. Hamdeni; F. Ben Charrada
Data grids provide scalable infrastructures for storage resources and data files management and support data-intensive applications. These applications require to efficiently access, store, transfer and analyze a large amount of data in geographically distributed locations around the world. Data replication is a key technique used in data grids to achieve these goals through creating multiple file replicas and placing them in a wisely manner. In this context, several replication strategies were proposed in the literature. The main idea of our work is to propose a new aspect of the evaluation of replication strategies which is the quality assessment of replicas placement in the data grid. This paper will indeed prove the impact of the distribution on the evaluation results. We hence show the importance of evaluating the quality of the replicas distribution in the data grid. Then, we propose evaluation processes of the quality of a given distribution. In this respect, different evaluation metrics are proposed for assessing the performances of replication strategies with respect to the distribution quality. We will also evaluate our metrics by using the OptorSim simulator and perform extensive experiments that will prove the effectiveness of our contributions.
Journal of Systems and Software | 2015
Tarek Hamrouni; Sarra Slimani; Faouzi Ben Charrada
A survey on data grid replication strategies based on data mining is presented.A new replication strategy based on file correlations is designed.A new mining algorithm of maximal frequent correlated patterns is proposed.The mined sets of highly correlated files are used in a new replication process.Experiments are performed using four access patterns and five evaluation metrics. Data grids have emerged as a useful technology for managing large amounts of distributed data in many fields like scientific experiments and engineering applications. In this regard, replication in data grids is an efficient technique that aims to improve response time, reduce the bandwidth consumption and maintain reliability. Unfortunately, most of existing replication strategies consider a single file-based granularity and neglect correlations among different data files. However, the analysis of many real data intensive applications reveals that jobs and applications request groups of correlated files. In this paper, we propose a new dynamic periodic decentralized data replication strategy, called RSCP11RSCP is the acronym of Replication Strategy based on Correlated Patterns., which considers a set of correlated files as granularity. In order to find out these correlations, a new maximal frequent correlated pattern mining algorithm of the data mining field is introduced. The data in this work is read-only and so there are no consistency issues involved. The evaluation metrics we analyze in the experiments are mean job execution time, effective network usage, total number of replications, hit ratio and percentage of storage filled. Using the OptorSim simulator, extensive experimentations show that our proposed strategy has better performance in comparison to other strategies under most of access patterns.
2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing | 2014
C. Hamdeni; Tarek Hamrouni; F. Ben Charrada
Data grids provide distributed resources for dealing with large scale applications generating huge volume of data that require to be efficiently managed, shared and analyzed. Data replication is a useful technique to solve these tasks since it allows minimizing data access time through creating many replicas and storing them in appropriate locations. Several replication strategies have been proposed in the literature. The main idea of our work is to propose a new aspect of the evaluation of replication strategies which is the quality assessment of replicas placement in the data grid. In this paper, we prove the influence of the distribution of replicas through sites on the evaluation results. We then show the importance of evaluating the quality of the replicas distribution in data grids.
Sigkdd Explorations | 2009
Tarek Hamrouni
The last years witnessed an explosive progress in networking, storage, and processing technologies resulting in an unprecedented amount of digitalization of data. There is hence a considerable need for tools or techniques to delve and efficiently discover valuable, non-obvious information from large databases. In this situation, Knowledge Discovery in Databases offers a complete process for the non-trivial extraction of implicit, previously unknown, and potentially useful knowledge from data. Amongst its steps, data mining offers tools and techniques for such an extraction. Much research in data mining from large databases has focused on the discovery of association rules which are used to identify relationships between sets of items in a database. The discovered association rules can be used in various tasks, such as depicting purchase dependencies, classification, medical data analysis, etc. In practice however, the number of frequently occurring itemsets, used as a basis for rule derivation, is very large, hampering their effective exploitation by the end-users. In this situation, a determined effort focused on defining manageably-sized sets of patterns, called concise representations, from which redundant patterns can be regenerated. The purpose of such representations is to reduce the number of mined patterns to make them manageable by the end-users while preserving as much as possible the hidden and interesting information about data. Many concise representations for frequent patterns were so far proposed in the literature, mainly exploring the conjunctive search space. In this space, itemsets are characterized by the frequency of their co-occurrence. A detailed study proposed in this thesis shows that closed itemsets and minimal generators play a key role for concisely representing both frequent itemsets and association rules. These itemsets structure the search space into equivalence classes such that each class gathers the itemsets appearing in the same subset aka objects or transactions of the given data. A closed itemset includes the most specific expression describing the associated transactions, while a minimal generator includes one of the most general expressions. However, an intra-class combinatorial redundancy would logically results from the inherent absence of a unique minimal generator associated to a given closed itemset. This motivated us to carry out an in-depth study aiming at only retaining irreducible minimal generators in each equivalence class, and pruning the remaining ones. In this respect, we propose lossless reductions of the minimal generator set thanks to a new substitution-based process. We then carry out a thorough study of the associated properties of the obtained families. Our theoretical results will then be extended to the association rule framework in order to reduce as much as possible the number of retained rules without information loss. We then give a thorough formal study of the related inference mechanism allowing to derive all redundant association rules, starting from the retained ones. In order to validate our approach, computing means for the new pattern families are presented together with empirical evidences about their relative sizes w.r.t. the entire sets of patterns. We also lead a thorough exploration of the disjunctive search space, where itemsets are characterized by their respective disjunctive supports, instead of the conjunctive ones. Thus, an itemset verifies a portion of data if at least one of its items belongs to it. Disjunctive itemsets thus convey knowledge about complementary occurrences of items in a dataset. This exploration is motivated by the fact that, in some applications, such information -- conveyed through disjunctive support -- brings richer knowledge to the end-users. In order to obtain a redundancy-free representation of the disjunctive search space, an interesting solution consists in selecting a unique element to represent itemsets covering the same set of data. Two itemsets are equivalent if their respective items cover the same set of data. In this regard, we introduce a new operator dedicated to this task. In each induced equivalence class, minimal elements are called essential itemsets, while the largest one is called disjunctive closed itemset. The introduced operator is then at the roots of new concise representations of frequent itemsets. We also exploit the disjunctive search space to derive generalized association rules. These latter rules generalize classic ones to also offer disjunction and negation connectors between items, in addition to the conjunctive one. Dedicated tools were then designed and implemented for extracting disjunctive itemsets and generalized association rules. Our experiments showed the usefulness of our exploration and highlighted interesting compactness rates.
International Journal of Web Engineering and Technology | 2016
Tarek Hamrouni; C. Hamdeni; Faouzi Ben Charrada
Replication is an important issue for the data grid performance. Indeed, it has for main purposes to improve data access efficiency, provide high availability, decrease bandwidth consumption, improve fault tolerance and enhance scalability. In this regard, where to place a given replica represents an important step in a replication process. In this context, a significant number of placement strategies has been proposed in the literature. Each placement strategy produces a specific replicas distribution through the grid sites. This distribution will directly affect the performance of the grid. In this paper, we propose a new evaluation metric called the distribution quality, and denoted DisQ. This metric quantifies, at a given point in time, the quality of a given distribution of replicas through the grid nodes. DisQ helps to know beforehand that a given distribution will contribute in obtaining interesting data grid performance or it will degrade them. Using DisQ, we propose a correction for the evaluation metrics of replication strategies to make them more reliable. We use the OptorSim simulator to validate our theoretical contributions.
Journal of Network and Computer Applications | 2016
C. Hamdeni; Tarek Hamrouni; F. Ben Charrada
Distributed systems continue to be a promising area of research particularly in terms of providing efficient data access and maximum data availability for large-scale applications. For improving performances of distributed systems, several data replication strategies have been proposed to ensure reliability and data transfer speed as well as to offer the possibility to access the data efficiently from multiple locations. Data popularity is one of the most important parameters taken into consideration when designing data replication strategies. It assesses how much the data is requested by the sites of the system. In this paper, the importance of considering the data popularity parameter in replication management is highlighted. Different strategies are then identified and how they rely on the data popularity parameter is illustrated. Different calculation manners of data popularity are hence studied. This allows us to find out which factors are considered in order to assess data popularity. After classifying them into four categories, this work includes a critical discussion about each category. Some important directions for future work are then discussed towards possible solutions for a more effective data popularity assessment.
parallel and distributed computing: applications and technologies | 2014
Sarra Slimani; Tarek Hamrouni; Faouzi Ben Charrada
Data replication in data grids is an efficient technique that aims to improve response time, reduce the bandwidth consumption and maintain reliability. In this context, a lot of work is done and many strategies have been proposed. Unfortunately, most of existing replication techniques are based on single file granularity and neglect correlation among different data files. Indeed, file correlations become an increasingly important consideration for performance enhancement in data grids. In fact, the analysis of real data intensive grid applications reveals that job requests for groups of correlated files and suggests that these correlations can be exploited for improving the effectiveness of replication strategies. In this paper, we propose a new dynamic periodic decentralized data replication strategy, called RSBMFCP (1), which consider a set of correlated files as granularity. Our strategy gathers files according to a relationship of simultaneous accesses between files by jobs and stores correlated files at the same site. In order to find out these correlations, a maximal frequent correlated pattern mining algorithm of the data mining field is introduced. We choose the all-confidence as correlation measure. The proposed strategy consists of four steps: storing file access history, converting the file access history into a logical history file, applying maximal frequent correlated pattern mining algorithm and performing replication and replacement. Experiments using the well-known data grid simulator Opt or Sim show that our proposed strategy has better performance in comparison with other strategies in terms of job execution time and effective network usage.
Cluster Computing | 2016
C. Hamdeni; Tarek Hamrouni; F. Ben Charrada
Distributed systems provide geographically distributed resources for large-scale applications while managing large volumes of data. In this context, replication of data in several sites of the system is an effective solution for achieving interesting performances. A number of data replication strategies have been proposed in the literature. Data popularity is one of the most important parameters taken into consideration by these strategies. It analyzes the historic of the data access pattern, and provides predictions for future data requests. However, measuring data popularity is a challenging task because there are several factors that contribute to the evaluation of data popularity. In this paper, a new adaptive measurement for data popularity in distributed systems is proposed. The proposed measurement covers all factors taken into consideration by previous work of the literature. It also takes into consideration new factors to deal with the dynamic nature of the system so it can adapt to any access pattern. We show that the exploitation of our measurement improves the performances of replication strategies, while offering the possibility to use the data popularity parameter in new contexts in replication management.