Sofian Maabout | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sofian Maabout is active.

Explore More

Publication

Featured researches published by Sofian Maabout.

extending database technology | 2009

A view selection algorithm with performance guarantee

Nicolas Hanusse; Sofian Maabout; Radu Tofan

A view selection algorithm takes as input a fact table and computes a set of views to store in order to speed up queries. The performance of view selection algorithm is usually measured by three criteria: (1) the amount of memory to store the selected views, (2) the query response time and (3) the time complexity of this algorithm. The two first measurements deal with the output of the algorithm. No existing solutions give good trade-off between amount of memory and queries cost with a small time complexity. We propose in this paper an algorithm guaranteeing a constant approximation factor of queries response time with respect to the optimal solution. Moreover, the time complexity for a D-dimensional fact table is O (D * 2D) corresponding to the fastest known algorithm. We provide an experimental comparison with two other well known algorithms showing that our approach also gives good performance in terms of memory.

data and knowledge engineering | 2014

Extending ER models to capture database transformations to build data sets for data mining

Carlos Ordonez; Sofian Maabout; David Sergio Matusevich; Wellington Cabrera

In a data mining project developed on a relational database, a significant effort is required to build a data set for analysis. The main reason is that, in general, the database has a collection of normalized tables that must be joined, aggregated and transformed in order to build the required data set. Such scenario results in many complex SQL queries that are written independently from each other, in a disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, creating problems in database evolution and software maintenance. In this paper, we classify potential database transformations, we extend an ER diagram with entities capturing database transformations and we introduce an algorithm which automates the creation of such extended ER model. We present a case study with a public database illustrating database transformations to build a data set to compute a typical data mining model.

Annals of Mathematics and Artificial Intelligence | 2015

Functional dependencies are helpful for partial materialization of data cubes

Eve Garnaud; Sofian Maabout; Mohamed Mosbah

Functional dependencies (FD’s) are a powerful concept in data organization. They have been proven very useful in e.g., relational databases for reducing data redundancy. Little work however has been done so far for using them in the context of data cubes. In the present paper, we propose to characterize the parts of a data cube to be materialized with the help of the FD’s present in the underlying data. For this purpose, we consider two applications: (i) how to choose the best cuboids of a data cube to materialize in order to guarantee a fixed performance of query evaluation and, (ii) how to choose the best tuples, hence partial cuboids, in order to reduce the size of the data cube without loosing information. In both cases FD’s turn to be fundamental in characterizing the solutions of these problems.

acm symposium on applied computing | 2013

Horizontal partitioning of very-large data warehouses under dynamically-changing query workloads via incremental algorithms

Ladjel Bellatreche; Rima Bouchakri; Alfredo Cuzzocrea; Sofian Maabout

With the explosion of the size of data warehousing applications, the horizontal data partitioning is well adapted to reduce the cost of complex OLAP queries and the warehouse manageability. It is considered as a non redundant optimization technique. Selecting a fragmentation schema for a given data warehouse is NP-hard problem. Several studies exist and propose heuristics to select near optimal solutions. Most of these heuristics are static, since they assume the existence of a priori known set of queries. Note that in real life applications, queries may change dynamically and fragmentation heuristics need to integrate these changes. In this paper, we propose an incremental selection of fragmentation schemes using on genetic algorithms. Intensive experiments are conducted to validate our proposal.

conference on information and knowledge management | 2011

A parallel algorithm for computing borders

Nicolas Hanusse; Sofian Maabout

The border concept has been introduced by Mannila and Toivonen in their seminal paper [20]. This concept finds many applications, e.g maximal frequent itemsets, minimal functional dependencies, emerging patterns between consecutive database instances and materialized view selection. For large transactions and relational databases defined on n items or attributes, the running time of any border computations are mainly dominated by the time T (for standard sequential algorithms) required to test the interestingness, in general the frequencies, of sets of candidates. In this paper we propose a general parallel algorithm for computing borders whatever the application is. We prove the efficiency of our algorithm by showing that: (i) it generates exactly the same number of candidates as the standard sequential algorithm and, (ii) if the interestingness test time of a candidate is bounded by Δ then for a multi-processor shared memory machine with p cores, we prove that the total interestingness time Tp < T/p + 2 Δ n. We implemented our algorithm in the maximal frequent itemset (MFI) mining setting and our experiments confirm our theoretical performance guarantee.

content-based multimedia indexing | 2014

Scalable video summarization of cultural video documents in cross-media space based on data cube approach

Karina R. Perez-Daniel; Mariko Nakano Miyatake; Jenny Benois-Pineau; Sofian Maabout; Gabriel Sargent

Video summarization has been a core problem to manage the growing amount of content in multimedia databases. An efficient video summary should display an overview of the video content and most of existing approaches fulfil this goal. However the information does not allow user to get all details of interest selectively and progressively. This paper proposes a scalable video summarization approach which provides multiple views and levels of details. Our method relies on the usage of cross media space and consensus clustering method. A video document is modelled as a data cube where the level of details is refined over nonconsensual features of the space. The method is designed for weakly structured content such as cultural documentaries and was tested on the INA corpus of cultural archives.

international conference on information systems, technology and management | 2012

Uncertainty Interval Temporal Sequences Extraction

Asma Ben Zakour; Sofian Maabout; Mohamed Mosbah; Marc Sistiaga

Searching for frequent sequential patterns has been used in several domains. We note that times granularities are more or less important with regards to the application domain. In this paper we propose a frequent interval time sequences (ITS) extraction technique from discrete temporal sequences using a sliding window approach to relax time constraints. The extracted sequences offer an interesting overview of the original data by allowing a temporal leeway on the extraction process. We formalize the ITS extraction under classical time and support constraints and conduct some experiments on synthetic data for validating our proposal.

advances in databases and information systems | 2011

Revisiting the partial data cube materialization

Nicolas Hanusse; Sofian Maabout; Radu Tofan

The problem of selecting views and/or indexes to materialize has been extensively studied in the context of query optimization. Traditionally, the problem is formalized as follows: given a set of queries and a budget e.g., an available memory space, find the objects to materialize (views and/or indexes) that (1) satisfy the given budget and (2) minimize the query cost. In this paper, we depart from this setting by adopting a user-centric point of view: given a constraint on query evaluation, namely a maximal query cost the user does accept, find the objects (1) whose materialization needs the minimal storage space and (2) that guarantee the query evaluation constraint. We study this problem in the data cube setting and provide exact and approximate solutions.

international conference on internet and web applications and services | 2008

Methodological Aspects of Semantics Enrichment in Model Driven Architecture

Mouhamed Diouf; Kaninda Musumbu; Sofian Maabout

The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. An ontology defines the terms used to describe and represent an area of knowledge. Ontologies are used by people, databases, and applications that need to share domain information (a domain is just a specific subject area or area of knowledge, like medicine, tool manufacturing, real estate, automobile repair, financial management, etc.). In this paper we combine this two concepts to annotate models with meta-data according to the corresponding domain ontology with all the new extracted information in order to improve the performance of the entire system.

foundations of information and knowledge systems | 2012

Using functional dependencies for reducing the size of a data cube

Eve Garnaud; Sofian Maabout; Mohamed Mosbah

Functional dependencies (FDs) are a powerful concept in data organization. They have been proven very useful in e.g., relational databases for reducing data redundancy. Little work however has been done so far for using them in the context of data cubes. In the present paper, we propose to characterize the parts of a data cube to be materialized with the help of the FDs present in the underlying data. For this purpose, we consider two applications: (i) how to choose the best cuboids of a data cube to materialize in order to guarantee a fixed performance of queries and, (ii) how to choose the best tuples, hence partial cuboids, in order to reduce the size of the data cube without loosing information. In both cases we show how FDs are fundamental.

Explore More