Laurent d'Orazio | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Laurent d'Orazio is active.

Explore More

Publication

Featured researches published by Laurent d'Orazio.

Information Systems | 2015

Density-based data partitioning strategy to approximate large-scale subgraph mining

Sabeur Aridhi; Laurent d'Orazio; Mondher Maddouri; Engelbert Mephu Nguifo

Recently, graph mining approaches have become very popular, especially in certain domains such as bioinformatics, chemoinformatics and social networks. One of the most challenging tasks is frequent subgraph discovery. This task has been highly motivated by the tremendously increasing size of existing graph databases. Due to this fact, there is an urgent need of efficient and scaling approaches for frequent subgraph discovery. In this paper, we propose a novel approach for large-scale subgraph mining by means of a density-based partitioning technique, using the MapReduce framework. Our partitioning aims to balance computational load on a collection of machines. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases.Recently, graph mining approaches have become very popular, especially in domains such as bioinformatics, chemoinformatics and social networks. In this scope, one of the most challenging tasks is frequent subgraph discovery. This task has been motivated by the tremendously increasing size of existing graph databases. Since then, an important problem of designing efficient and scaling approaches for frequent subgraph discovery in large clusters, has taken place. However, failures are a norm rather than being an exception in large clusters. In this context, the MapReduce framework was designed so that node failures are automatically handled by the framework. In this paper, we propose a large-scale and fault-tolerant approach of subgraph mining by means of a density-based partitioning technique, using MapReduce. Our partitioning aims to balance computation load on a collection of machines. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases.

international conference on data management in grid and p2p systems | 2010

Multidimensional arrays for warehousing data on clouds

Laurent d'Orazio; Sandro Bimonte

Data warehouses and OLAP systems are business intelligence technologies. They allow decision-makers to analyze on the fly huge volumes of data represented according to the multidimensional model. Cloud computing on the impulse of ICT majors like Google, Microsoft and Amazon, has recently focused the attention. OLAP querying and data warehousing in such a context consists in a major issue. Indeed, problems to be tackled are basic ones for large scale distributed OLAP systems (large amount of data querying, semantic and structural heterogeneity) from a new point of view, considering specificities from these architectures (pay-as-you-go rule, elasticity, and user-friendliness). In this paper we address the pay-as-you-go rules for warehousing data storage. We propose to use the multidimensional arrays storage techniques for clouds. First experiments validate our proposal.

database and expert systems applications | 2007

Distributed semantic caching in grid middleware

Laurent d'Orazio; Fabrice Jouanot; Yves Denneulin; Cyril Labbé; Claudia Roncancio; Olivier Valentin

This paper proposes a flexible caching solution to improve query evaluation in grids. It reduces both, data transfer and query computation, by adopting a distributed semantic caching approach. Our proposal introduces multi-scale cache cooperation including single site cooperation between object caches and distributed context aware cooperation between several query caches. Different cache miss resolution protocols are introduced for query evaluation and experimented in a grid data management for bioinformatics applications.

international database engineering and applications symposium | 2009

Semantic caching for pervasive grids

Laurent d'Orazio; Mamadou Kaba Traoré

Recently, grids and pervasive systems have been drawing increasing attention in order to coordinate large-scale resources enabling access to small and smart devices. In this paper, we propose a caching approach enabling to improve querying on pervasive grids. Our proposal, called semantic pervasive dual cache, follows a semantics oriented approach. It is based on the one hand on a clear separation between the analysis and the evaluation process, and on the other hand on the cooperation between client caches considering light analysis and proxy caches including evaluation capabilities. Such an approach helps load balancing, making the system more scalable. We have validated semantic pervasive dual cache using analytic models and simulations. Results obtained show that our approach is quite promising.

Proceedings of the 2nd International Workshop on Cloud Intelligence | 2013

Toward intersection filter-based optimization for joins in MapReduce

Thuong-Cang Phan; Laurent d'Orazio; Philippe Rigaux

MapReduce has become an attractive and dominant model for processing large-scale datasets. However, this model is not designed to directly support operations with multiple inputs as joins. Many studies on join algorithms including Bloom join in MapReduce have been conducted but they still have too much non-joining data generated and transmitted over the network. This research will help us eliminate the problem by providing an intersection filter based on probabilistic models to remove most disjoint elements between two datasets. Namely, three ways are proposed to build the intersection Bloom filter. To apply the filter to joins, a corresponding MapReduce job will be adjusted in a consistent way without increasing related costs. We then consider two-way joins and join cascades and analyze their costs. As a result, thanks to the high accuracy intersection filter, join processing can minimize disk I/O and communication costs. Finally, the research is proved to be more effective than existing solutions through a cost-based comparison of joins using different approaches.

international conference on management of data | 2013

Medical data management in the SYSEO project

Yahia Chabane; Laurent d'Orazio; Le Gruenwald; Baraa Mohamad; Christophe Rey

The SYSEO project aims at producing a software solution suitable for endoscopic imaging in order to enable physicians to manage, manipulate and share medical images. This paper presents our two main components for data management in this system: (1) a novel hybrid rowcolumn database for medical data storage within the cloud and (2) a system for semantic image annotation and retrieval relying on an ontology for polyps.

Proceedings of the 1st International Workshop on Cloud Intelligence | 2012

Towards a hybrid row-column database for a cloud-based medical data management system

Baraa Mohamad; Laurent d'Orazio; Le Gruenwald

Medical data management becomes a real exigency. The emergence of new medical imaging techniques and the necessity to access medical information at any time have led to an inevitable need to find new advanced solutions for managing these critical data. Actual local archiving systems are very expensive and cannot support this heterogeneous and enormous data size. Cloud computing has attracted significant attention due to its major characteristics of elasticity, availability and pay-per-use. A good exploitation of this infrastructure constitutes an effective and promising solution for managing medical data and images. In this position paper, we highlight the challenges in integrating highly heterogeneous data such as DICOM files in the clouds. We then propose a novel hybrid row-column, two-level database architecture for the storage of heterogeneous data over the cloud.

international conference on data management in grid and p2p systems | 2010

Merging file systems and data bases to fit the grid

Yves Denneulin; Cyril Labbé; Laurent d'Orazio; Claudia Roncancio

Grids are widely used by CPU intensive applications requiring to access data with high level queries as well as in a file based manner. Their requirements include accessing data through metadata of different kinds, system or application ones. In addition, grids provide large storage capabilities and support cooperation between sites. However, these solutions are relevant only if they supply good performance. This paper presents Gedeon, a middleware that proposes a hybrid approach for scientific data management for grid infrastructures. This hybrid approach consists in merging distributed files systems and distributed databases functionalities offering thus semantically enriched data management and preserving easiness of use and deployment. Taking advantage of this hybrid approach, advanced cache strategies are deployed at different levels to provide efficiency. Gedeon has been implemented, tested and used in the bioinformatic field.

international conference on data management in grid and p2p systems | 2008

Context-Aware Cache Management in Grid Middleware

Fabrice Jouanot; Laurent d'Orazio; Claudia Roncancio

This paper focuses on context-aware data management services in grids with the aim of constructing self-adaptive middleware. The contribution is twofold. First it proposes a framework to facilitate the development of context management services. The reason is that, even a context management service, is context specific. The creation of ad-hoc context managers is crucial. Second, this papers introduces context awareness in cooperative semantic caches. Preliminary results of our experiments on the Grid5000 platform are reported.

International Journal of Data Warehousing and Mining | 2014

Cost Models for Selecting Materialized Views in Public Clouds

Romain Perriot; JÃ©rÃ©my Pfeifer; Laurent d'Orazio; Bruno Bachelet; Sandro Bimonte; JÃ©rÃ´me Darmont

Data warehouse performance is usually achieved through physical data structures such as indexes or materialized views. In this context, cost models can help select a relevant set of such performance optimization structures. Nevertheless, selection becomes more complex in the cloud. The criterion to optimize is indeed at least two-dimensional, with monetary cost balancing overall query response time. This paper introduces new cost models that fit into the pay-as-you-go paradigm of cloud computing. Based on these cost models, an optimization problem is defined to discover, among candidate views, those to be materialized to minimize both the overall cost of using and maintaining the database in a public cloud and the total response time of a given query workload. It experimentally shows that maintaining materialized views is always advantageous, both in terms of performance and cost.

Explore More