Soumia Benkrid | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Soumia Benkrid is active.

Explore More

Publication

Featured researches published by Soumia Benkrid.

data warehousing and knowledge discovery | 2010

\(\mathcal{F}\)&\(\mathcal{A}\): A Methodology for Effectively and Efficiently Designing Parallel Relational Data Warehouses on Heterogenous Database Clusters

Ladjel Bellatreche; Alfredo Cuzzocrea; Soumia Benkrid

In this paper we propose a comprehensive methodology for designing Parallel Relational Data Warehouses (PRDW) over database clusters, called \(\mathcal{F}\) ragmentation&\(\mathcal{A}\) llocation (\(\mathcal{F}\)&\(\mathcal{A}\)). \(\mathcal{F}\)&\(\mathcal{A}\) assumes that cluster nodes are heterogeneous in processing power and storage capacity, contrary to traditional design approaches that assume that cluster nodes are instead homogeneous, and fragmentation and allocation phases are performed in a simultaneous manner, contrary to traditional design approaches that instead perform these phases in an isolated manner. Also, a naive replication algorithm that takes into account the heterogeneous characteristics of our reference architecture is proposed. Finally, our proposal is experimentally assessed and validated against the widely-known data warehouse benchmark APB-1 release II.

data warehousing and knowledge discovery | 2009

A Joint Design Approach of Partitioning and Allocation in Parallel Data Warehouses

Ladjel Bellatreche; Soumia Benkrid

Traditionally, designing a parallel data warehouse consists first in fragmenting its schema and then allocating the generated fragments over the nodes of the parallel machine. The main drawback of this approach is that interdependency between fragmentation and allocation processes is not taken into account during the design phase. This interdependency is characterized by the fact that generated of fragments are one of the inputs of the allocation problem and both processes optimize the same set of queries. In this paper, we present a new approach for designing parallel relational data warehouses on a shared nothing machine, where the fragmentation and the allocation are done simultaneously. To allocate efficiently query workload over nodes, a load balancing method is given. Finally, a validation of our proposals is presented.

Journal of Database Management | 2012

Effectively and Efficiently Designing and Querying Parallel Relational Data Warehouses on Heterogeneous Database Clusters: The F&A Approach

Ladjel Bellatreche; Alfredo Cuzzocrea; Soumia Benkrid

In this paper, a comprehensive methodology for designing and querying Parallel Rational Data Warehouses PRDW over database clusters, called Fragmentation & Allocation F&A is proposed. F&A assumes that cluster nodes are heterogeneous in processing power and storage capacity, contrary to traditional design approaches that assume that cluster nodes are instead homogeneous, and fragmentation and allocation phases are performed in a simultaneous manner. In classical approaches, two different cost models are used to perform fragmentation and allocation, separately, whereas F&A makes use of one cost model that considers fragmentation and allocation parameters simultaneously. Therefore, according to the F&A methodology proposed, the allocation phase/decision is done at fragmentation. At the fragmentation phase, F&A uses two well-known algorithms, namely Hill Climbing HC and Genetic Algorithm GA, which the authors adapt to the main PRDW design problem over heterogeneous database clusters, as these algorithms are capable of taking into account the heterogeneous characteristics of the reference application scenario. At the allocation phase, F&A introduces an innovative matrix-based formalism capable of capturing the interactions among fragments, input queries, and cluster node characteristics, driving the data allocation task accordingly, and a related affinity-based algorithm, called F&A-ALLOC. Finally, their proposal is experimentally assessed and validated against the widely-known data warehouse benchmark APB-1 release II.

international conference on algorithms and architectures for parallel processing | 2011

Verification of partitioning and allocation techniques on teradata DBMS

Ladjel Bellatreche; Soumia Benkrid; Ahmad Ghazal; Alain Crolotte; Alfredo Cuzzocrea

Data fragmentation and allocation in distributed and parallel Database Management Systems (DBMS) have been extensively studied in the past. Previous work tackled these two problems separately even though they are dependent on each other. We recently developed a combined algorithm that handles the dependency issue between fragmentation and allocation. A novel genetic solution was developed for this problem. The main issue of this solution and previous solutions is the lack of real life verifications of these models. This paper addresses this gap by verifying the effectiveness of our previous genetic solution on the Teradata DBMS. Teradata is a shared nothing DBMS with proven scalability and robustness in real life user environments as big as 10s of petabytes of relational data. Experiments are conducted for the genetic solution and previous work using the SSB benchmark (TPC-H like) on a Teradata appliance running TD 13.10. Results show that the genetic solution is faster than previous work by a 38%.

international conference on algorithms and architectures for parallel processing | 2010

Query optimization over parallel relational data warehouses in distributed environments by simultaneous fragmentation and allocation

Ladjel Bellatreche; Alfredo Cuzzocrea; Soumia Benkrid

Parallel database technology has already shown its efficiency in supporting high-performance Online Analytical Processing (OLAP) applications This scenario implies achieving query optimization over relational Data Warehouses (RDW) on top of which typical OLAP functionalities, such as roll-up, drill-down and aggregate query answering, can be implemented As a result, it follows the emerging need for a comprehensive methodology able to support the design of RDW over parallel and distributed environments in all the phases, including data partitioning, fragment allocation, and data replication Existing design approaches have an important limitation: fragmentation and allocation phases are performed in an isolated manner In order to overcome this limitation, in this paper we propose a new methodology for designing parallel RDW over distributed environments, for query optimization purposes The methodology is illustrated on database clusters, as a noticeable case of distributed environments Contrary to state-of-the-art approaches where allocation is performed after fragmentation, in our approach we propose allocating fragments just during the partitioning phase Also, a naive replication algorithm that takes into account the heterogeneous characteristics of our reference architecture is proposed.

complex, intelligent and software intensive systems | 2012

The FaA Methodology and Its Experimental Validation on a Real-Life Parallel Processing Database System

Ladjel Bellatreche; Soumia Benkrid; Alain Crolotte; Alfredo Cuzzocrea; Ahmad Ghazal

This paper complements our previous results in the context of effectively and efficient designing Parallel Relational Data Warehouses (PRDW) over heterogeneous database clusters, which are represented by the proposal of a methodology called Fragmentation & Allocation (F& A). The main merit of F& A is that of combining the fragmentation and the allocation phases simultaneously, which are instead performed separately by traditional approaches. In this paper, we prove the practical impact and the reliability of F& A on a real-life parallel processing database system.

database and expert systems applications | 2008

A Combined Selection of Fragmentation and Allocation Schemes in Parallel Data Warehouses

Soumia Benkrid; Ladjel Bellatreche; Habiba Drias

The process of designing a parallel data warehouse has two main steps: (1) fragmentation and (2) allocation of generated fragments at various nodes. Usually, fragmentation and allocation tasks are used iteratively (we first split the warehouse horizontally and then allocate fragments over the nodes). The main drawback of such design approach (called iterative) is that it does not take into account the interdependencies between fragmentation and allocation since the generated fragments are the input of data allocation problem. In this paper, we consider a parallel data warehouse design approach combining data fragmentation and allocation. Its main characteristic is that it decides on the quality of the allocation schema when fragmenting the warehouse. Our approach is validated using computational tests over a variety of parameter values.

Trans. Large-Scale Data- and Knowledge-Centered Systems | 2014

A Global Paradigm for Designing Parallel Relational Data Warehouses in Distributed Environments

Soumia Benkrid; Ladjel Bellatreche; Alfredo Cuzzocrea

Designing a Parallel Relational Data Warehouse (PRDW) consists of a set of tasks: (i) choosing the hardware architecture; (ii) fragmenting the data warehouse schema; (iii) allocating the generated fragments; (iv) replicating fragments in order to ensure high performance; (v) defining the strategies for load balancing and query processing. The major drawback of this life-cycle is the fact that it does not consider the inter-dependency among sub-problems related to the design of PRDW, and it makes use of heterogeneous metrics to evaluate the “quality” of the final design. In previous research efforts, we introduced an analytical cost model for parallel OLAP query processing in cluster environments. In a second experience, we have taken into account the inter-dependency existing between fragmentation and allocation. In this paper, we propose a novel methodology, called \(\mathcal {F}\)&\(\mathcal {A}\)&\(\mathcal {R}\), which further extends previous results, and defines an approach where the main PRDW design phases (i.e., fragmentation, allocation, and replication) are performed simultaneously, in a global fashion. In particular, our approach determines whether the fragmentation pattern currently generated is relevant to the allocation process or not. An original method of supporting data replication, based on fuzzy k-means clustering, is also proposed and successfully integrated within the whole design framework. Finally, we experimentally assessed the performance of \(\mathcal {F}\)&\(\mathcal {A}\)&\(\mathcal {R}\) against a well-known data warehouse benchmark, with very promising results.

international conference on algorithms and architectures for parallel processing | 2015

HYPAD: Hyper-Graph-Driven Approach for Parallel Data Warehouse Design

Ahcene Boukorca; Ladjel Bellatreche; Soumia Benkrid

Small, medium and large companies all face three well-identified problems, precisely: (i) the data deluge, (ii) the large number of interacted exploratory queries and (iii) the economic crisis. Hence, it becomes a real necessity to consider those problems and develop low-cost database deployment solutions. Data parallel architectures are one of the relevant deployment platforms that may manage efficiently this deluge of data. The process of designing such architecture has to integrate the interaction that may exist between queries. Although, the state-of-art on parallel data warehouses is quite rich, to the best of our knowledge, the query interaction is not highlighted. Amazingly, the queries are in the core of the parallel design. Ignoring their interaction may impact the quality of the final design. In this paper, we propose a new scalable hyper-graph approach, called HYPAD, for designing cluster warehouses by considering concurrent analytical highly interacted queries. Our approach is validated through a data warehouse cluster simulator. The obtained results show the effectiveness and efficiency of our proposal.

Technique Et Science Informatiques | 2011

Une démarche conjointe de fragmentation et de placement dans le cadre des entrepôts de données parallèles

Soumia Benkrid; Ladjel Bellatreche

Résumé.Traditionnellement, concevoir un entrepôt de données para llèle consiste d’abord à partitionner son schéma ensuite allouer les fragm ents générés sur les noeuds d’une machine parallèle. L’inconvénient majeur d’u ne telle approche est son ignorance de l’interdépendance entre les processus de f ragmentation et d’allocation. Une des entrées du problème d’allocation est l’en semble de fragments générés par la fragmentation. Notons que les deux processus cherchent à optimiser le même ensemble de requêtes. Dans ce papier, nous propos ons une approche de conception d’un entrepôt de données relationnel parallè le selon une architecture distribuée (shared nothing) intégrant les process us de fragmentation et d’allocation. Ensuite, une méthode de répartition de charg es sur les noeuds de la machine parallèle est proposée. Finalement, une validatio n de nos propositions en utilisant le banc d’essai APB-1 release II est présentée.

Explore More