Fadila Bentayeb | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fadila Bentayeb is active.

Explore More

Publication

Featured researches published by Fadila Bentayeb.

model and data engineering | 2014

Columnar NoSQL Star Schema Benchmark

Khaled Dehdouh; Omar Boussaid; Fadila Bentayeb

Benchmarking data warehouses is a means to evaluate the performance of systems and the impacts of different technical choices. Developed on relational models which have been for a few years the most used to support classical data warehousing applications such as Star Schema Benchmark (SSB). SSB is designed to measure performance of database products when executing star schema queries. As the volume of data keeps growing, the types of data generated by applications become richer than before. As a result, traditional relational databases are challenged to manage big data. Many IT companies attempt to manage big data challenges using a NoSQL (Not only SQL) database, and may use a distributed computing system. NoSQL databases are known to be non-relational, horizontally scalable, distributed. We present in this paper a new benchmark for columnar NoSQL data warehouse, namely CNSSB (Columnar NoSQL Star Schema Benchmark). CNSSB is derived from SSB and allows generating synthetic data and queries set to evaluate column-oriented NoSQL data warehouse. We have implemented CNSSB under HBase column-oriented database management system (DBMS), and apply its charge of queries to evaluate performance between two SQL skins, Phoenix and HQL (Hive Query Language). That allowed us to observe a better performance of Phoenix compared to HQL.

Journal of Global Optimization | 2007

Integration and dimensional modeling approaches for complex data warehousing

Omar Boussaid; Adrian Tanasescu; Fadila Bentayeb; Jérôme Darmont

With the broad development of the World Wide Web, various kinds of heterogeneous data (including multimedia data) are now available to decision support tasks. A data warehousing approach is often adopted to prepare data for relevant analysis. Data integration and dimensional modeling indeed allow the creation of appropriate analysis contexts. However, the existing data warehousing tools are well-suited to classical, numerical data. They cannot handle complex data. In our approach, we adapt the three main phases of the data warehousing process to complex data. In this paper, we particularly focus on two main steps in complex data warehousing. The first step is data integration. We define a generic UML model that helps representing a wide range of complex data, including their possible semantic properties. Complex data are then stored in XML documents generated by a piece of software we designed. The second important phase we address is the preparation of data for dimensional modeling. We propose an approach that exploits data mining techniques to assist users in building relevant dimensional models.

systems, man and cybernetics | 2014

Columnar NoSQL CUBE: Agregation operator for columnar NoSQL data warehouse

Khaled Dehdouh; Fadila Bentayeb; Omar Boussaid; Nadia Kabachi

The emergence of large volumes of data imposed by the major players of the web requires new management models and new data storage architectures and treatment able to find information quickly in a large volume of data. The column-oriented NoSQL (Not Only SQL) database provide for big data the most suitable model to the data warehouse and the structure of multidimensional data in OLAP cube form. However, in the absence of OLAP cube computation operators, we propose in this paper, a new aggregation operator called CN-CUBE (Columnar NoSQL CUBE), which allows data cubes to be computed from data warehouses stored in column-oriented NoSQL database management system. We implemented the CNCUBE operator using the SQL Phoenix interface of HBase DBMS and conducted experiments on a public data warehouse in a distributed environment produced using the Hadoop platform. Thus we have shown that our CN-CUBE operator has OLAP cubes computation times very suitable for NoSQL warehouses.

tpc technology conference | 2013

PRIMEBALL: A Parallel Processing Framework Benchmark for Big Data Applications in the Cloud

Jaume Ferrarons; Mulu Adhana; Carlos Colmenares; Sandra Pietrowska; Fadila Bentayeb; Jérôme Darmont

In this position paper, we draw the specifications for a novel benchmark for comparing parallel processing frameworks in the context of big data applications hosted in the cloud. We aim at filling several gaps in already existing cloud data processing benchmarks, which lack a real-life context for their processes, thus losing relevance when trying to assess performance for real applications. Hence, we propose a fictitious news site hosted in the cloud that is to be managed by the framework under analysis, together with several objective use case scenarios and measures for evaluating system performance. The main strengths of our benchmark definition are parallelization capabilities supporting cloud features and big data properties.

International Journal of Data Warehousing and Mining | 2015

Contextualized Text OLAP Based on Information Retrieval

Lamia Oukid; Nadjia Benblidia; Fadila Bentayeb; Ounas Asfari; Omar Boussaid

Current data warehousing and On-Line Analytical Processing (OLAP) systems are not yet particularly appropriate for textual data analysis. It is therefore crucial to develop a new data model and an OLAP system to provide the necessary analyses for textual data. To achieve this objective, this paper proposes a new approach based on information retrieval (IR) techniques. Moreover, several contextual factors may significantly affect the information relevant to a decision-maker. Thus, the paper proposes to consider contextual factors in an OLAP system to provide relevant results. It provides a generalized approach for Text OLAP analysis which consists of two parts: The first one is a context-based text cube model, denoted CXT-Cube. It is characterized by several contextual dimensions. Hence, during the OLAP analysis process, CXT-Cube exploits the contextual information in order to better consider the semantics of textual data. Besides, the work associates to CXT-Cube a new text analysis measure based on an OLAP-adapted vector space model and a relevance propagation technique. The second part is an OLAP aggregation operator called ORank (OLAP-Rank) which allows to aggregate textual data in an OLAP environment while considering relevant contextual factors. To consider the user context, this paper proposes a query expansion method based on a decision-maker profile. Based on IR metrics, it evaluates the proposed aggregation operator in different cases using several data analysis queries. The evaluation shows that the precision of the system is significantly better than that of a Text OLAP system based on classical IR. This is due to the consideration of the contextual factors.

data warehousing and knowledge discovery | 2014

Towards an OLAP Environment for Column-Oriented Data Warehouses

Khaled Dehdouh; Fadila Bentayeb; Omar Boussaid; Nadia Kabachi

Column-oriented database systems offer decision-makers the most appropriate model for data warehouse storage. However, in the absence of on-line analytical operators, the only, very costly, way of constructing OLAP cubes involves using the UNION operator for group by queries in order to obtain all the Group By required to compute the OLAP cube. To solve this problem, in this article we propose a new aggregation operator, called C-CUBE (Columnar-CUBE), which allows data cubes to be computed using column-oriented data warehouses. We implemented the C-CUBE operator within the column-oriented DBMS, MonetDB and conducted experiments on the benchmark SSBM (Star Schema Benchmark). Thus we have shown that C-CUBE has OLAP cubes computation times reduced by up to 60% compared with the SQL Server CUBE operator in a 1TB warehouse.

database and expert systems applications | 2017

Logical Schema for Data Warehouse on Column-Oriented NoSQL Databases

Mohamed Boussahoua; Omar Boussaid; Fadila Bentayeb

The column-oriented NoSQL systems propose a flexible and highly denormalized data schema that facilitates data warehouse scalability. However, the implementation process of data warehouses with NoSQL databases is a challenging task as it involves a distributed data management policy on multi-nodes clusters. Indeed, in column-oriented NoSQL systems, the query performances can be improved by a careful data grouping. In this paper, we present a method that uses clustering techniques, in particular k-means, to model the better form of column families, from existing fact and dimensional tables. To validate our method, we adopt TPC-DS data benchmark. We have conducted several experiments to examine the benefits of clustering techniques for the creation of column families in a column-oriented NoSQL HBase database on Hadoop platform. Our experiments suggest that defining a good data grouping on HBase database during the implementation of a data warehouse increases significantly the performance of the decisional queries.

International Journal of Decision Support System Technology | 2015

Towards Collaborative Multidimensional Query Recommendation with Triadic Association Rules

Sid Ali Selmane; Omar Boussaid; Fadila Bentayeb

This paper describes a new personalization process for decisional queries through a new approach based on triadic association rules mining. This process exploits the decision query log files of end users and follows these five steps: 1 generation of a triadic context from the multidimensional query logs of OLAP1 query analysis server; 2 mapping the triadic context into the dyadic one; 3 computation of conventional dyadic association rules; 4 generation of triadic association rules through a factorization process of dyadic ones and convey a richer semantics. The aim of the personalization approach which is based on triadic rules is to recommend new decision queries to OLAP end users sharing some common properties. This paper aims at helping this class of users by recommending them personalized OLAP queries that they might use in their future OLAP sessions. To validate the approach, the authors developed a software prototype called P-TRIAR Personalization based on TRIadic Association Rules which extracts two types of triadic association rules from decision query log files. The first type of triadic rules will serve to the recommending queries by taking the collaborative aspect of OLAP users into account. The second type of triadic rules will enrich user queries. Preliminary experiments were conducted on both real and synthetic datasets to assess the quality of the recommendations in term of precision and recall measures, as well as the performance of their on-line computation.

systems, man and cybernetics | 2015

Intentional Data Placement Optimization for Distributed Data Warehouses

Billel Arres; Nadia Kabachi; Omar Boussaid; Fadila Bentayeb

Parallel computing is a fundamental technique in the management of large quantities of data as it leverages on the concurrent utilization of multiple computing resources. One of the technologies that made big data analytics popular and accessible to enterprises of all sizes is MapReduce (and its open-source Hadoop implementation). With the ability to automatically parallelize the application on a cluster of commodity hardware, MapReduce allows enterprises to analyze terabytes and petabytes of data more conveniently than ever. However, the performance gained from Hadoops features is currently limited by its default block placement policy, which does not take any data characteristics into account. Indeed, the efficiency of many operations can be improved by a careful data placement, including indexing, grouping, aggregation and joins. In this paper, we present a MapReduce data blocks allocation approach to improve MapReduce jobs execution and query performances on multi-nodes clusters, especially Hadoop clusters. Based on k-means clustering method that allows to master the number of clusters through its k parameter, we study the influence of number of clusters on queries execution instead of queries performances with and without data organization. For this, we used well-known, large-scale data analysis benchmark: TPC-H. Our experiments suggest that defining a good data placement on a cluster during the implementation of a data warehouse increase significantly the OLAP cube construction and querying performances.

International Journal of Metadata, Semantics and Ontologies | 2015

Community Cube: a semantic framework for analysing social network data

Lilia Hannachi; Nadjia Benblidia; Omar Boussaid; Fadila Bentayeb

Social network research has focused mainly on the inference of social influence, building of user communities and prediction analysis. However, very little work has been carried out to investigate and document how online analytical processing OLAP tools can interactively analyse social networks according to different perspectives and with multiple granularities. In this paper we study the use of data warehousing and OLAP technologies with such new multidimensional social network by proposing Community Cube architecture. Going beyond traditional OLAP operations, our architecture proposes a new method that combines data mining and OLAP tools to navigate through the user hierarchy. Besides traditional OLAP queries, our approach introduces a new class of queries, which we named NetCuboid. These queries take into account the attributes associated with networks entities, the user-generated content, and the topological structure of the networks. Experimental results showcase the effectiveness of our approach for decision-making based on social network data.

Explore More