Jérôme Darmont | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jérôme Darmont is active.

Explore More

Publication

Featured researches published by Jérôme Darmont.

intelligent information systems | 2009

Data mining-based materialized view and index selection in data warehouses

Kamel Aouiche; Jérôme Darmont

Materialized views and indexes are physical structures for accelerating data access that are casually used in data warehouses. However, these data structures generate some maintenance overhead. They also share the same storage space. Most existing studies about materialized view and index selection consider these structures separately. In this paper, we adopt the opposite stance and couple materialized view and index selection to take view–index interactions into account and achieve efficient storage space sharing. Candidate materialized views and indexes are selected through a data mining process. We also exploit cost models that evaluate the respective benefit of indexing and view materialization, and help select a relevant configuration of indexes and materialized views among the candidates. Experimental results show that our strategy performs better than an independent selection of materialized views and indexes.

International Journal of Business Intelligence and Data Mining | 2009

Fragmenting very large XML data warehouses via K-means clustering algorithm

Alfredo Cuzzocrea; Jérôme Darmont; Hadj Mahboubi

XML data sources are gaining popularity in the context of Business Intelligence and On-Line Analytical Processing (OLAP) applications, due to the amenities of XML in representing and managing complex and heterogeneous data. However, XML-native database systems currently suffer from limited performance, both in terms of volumes of manageable data and query response time. Therefore, recent research efforts are focusing on horizontal fragmentation techniques, which are able to overcome the above limitations. However, classical fragmentation algorithms are not suitable to control the number of originated fragments, which instead plays a critical role in data warehouses. In this paper, we propose the use of the K-means clustering algorithm for effectively and efficiently supporting the fragmentation of very large XML data warehouses. We complement our analytical contribution with a comprehensive experimental assessment where we compare the efficiency of our proposal against existing fragmentation algorithms.

data warehousing and knowledge discovery | 2005

Automatic selection of bitmap join indexes in data warehouses

Kamel Aouiche; Jérôme Darmont; Omar Boussaid; Fadila Bentayeb

The queries defined on data warehouses are complex and use several join operations that induce an expensive computational cost. This cost becomes even more prohibitive when queries access very large volumes of data. To improve response time, data warehouse administrators generally use indexing techniques such as star join indexes or bitmap join indexes. This task is nevertheless complex and fastidious. Our solution lies in the field of data warehouse auto-administration. In this framework, we propose an automatic index selection strategy. We exploit a data mining technique ; more precisely frequent itemset mining, in order to determine a set of candidate indexes from a given workload. Then, we propose several cost models allowing to create an index configuration composed by the indexes providing the best profit. These models evaluate the cost of accessing data using bitmap join indexes, and the cost of updating and storing these indexes.

Archive | 2006

Processing And Managing Complex Data for Decision Support

Jérôme Darmont; Omar Boussaid

A Sample of Contents: Goal-Oriented Requirement Engineering for XML Document Warehouses Building an Active Content Warehouse Text Warehousing: Present and Future On the Usage of Structural Distance Metrics for Mining Hierarchical Structures Evaluation and Applications of Structural Similarity Measures in Sources of XML Documents Pattern Management: Practice and Challenges Data Mining in Gene Expression Data Analysis: A Survey.

IEEE Transactions on Knowledge and Data Engineering | 2013

A Survey of XML Tree Patterns

Marouane Hachicha; Jérôme Darmont

With XML becoming a ubiquitous language for data interoperability purposes in various domains, efficiently querying XML data is a critical issue. This has lead to the design of algebraic frameworks based on tree-shaped patterns akin to the tree-structured data model of XML. Tree patterns are graphic representations of queries over data trees. They are actually matched against an input data tree to answer a query. Since the turn of the 21st century, an astounding research effort has been focusing on tree pattern models and matching optimization (a primordial issue). This paper is a comprehensive survey of these topics, in which we outline and compare the various features of tree patterns. We also review and discuss the two main families of approaches for optimizing tree pattern matching, namely pattern tree minimization and holistic matching. We finally present actual tree pattern-based developments, to provide a global overview of this significant research topic.

international conference on management of data | 2010

Business intelligence for small and middle-sized entreprises

Oksana Grabova; Jérôme Darmont; Jean-Hugues Chauchat; Iryna Zolotaryova

Data warehouses are the core of decision support systems, which nowadays are used by all kind of enterprises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt existing solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises. Small enterprises require cheap, lightweight architectures and tools (hardware and software) providing online data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also review in-memory processing. Consequently, this paper discusses the existing approaches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making.

International Journal of Web Engineering and Technology | 2008

Warehousing complex data from the web

Omar Boussaid; Jérôme Darmont; Fadila Bentayeb; Sabine Loudcher

Data warehousing and Online Analytical Processing (OLAP) technologies are now moving onto handling complex data that mostly originate from the web. However, integrating such data into a decision-support process requires their representation in a form processable by OLAP and/or data mining techniques. We present in this paper a complex data warehousing methodology that exploits eXtensible Markup Language (XML) as a pivot language. Our approach includes the integration of complex data in an ODS, in the form of XML documents; their dimensional modelling and storage in an XML data warehouse; and their analysis with combined OLAP and data mining techniques. We also address the crucial issue of performance in XML warehouses.

International Journal of Business Intelligence and Data Mining | 2007

Benchmarking data warehouses

Jérôme Darmont; Fadila Bentayeb; Omar Boussaid

Database benchmarks can either help users in comparing the performances of different systems, or help engineers in testing the effect of various design choices. In the field of data warehouses, the Transaction Processing Performance Councils standard benchmarks address the first point, but they are not tunable enough to address the second one. We present in this paper the Data Warehouse Engineering Benchmark (DWEB), which allows generating various ad hoc synthetic data warehouses and workloads. We detail DWEBs full specifications, as well as the experiments we performed to illustrate how it may be used. DWEB is a Java free software.

arXiv: Databases | 2000

Dynamic Clustering in Object-Oriented Databases: An Advocacy for Simplicity

Jérôme Darmont; Christophe Fromantin; Stephane Régnier; Le Gruenwald; Michel Schneider

We present in this paper three dynamic clustering techniques for Object-Oriented Databases (OODBs). The first two, Dynamic, Statistical & Tunable Clustering (DSTC) and StatClust, exploit both comprehensive usage statistics and the inter-object reference graph. They are quite elaborate. However, they are also complex to implement and induce a high overhead. The third clustering technique, called Detection & Reclustering of Objects (DRO), is based on the same principles, but is much simpler to implement. These three clustering algorithm have been implemented in the Texas persistent object store and compared in terms of clustering efficiency (i.e., overall performance increase) and overhead using the Object Clustering Benchmark (OCB). The results obtained showed that DRO induced a lighter overhead while still achieving better overall performance.

Journal of Database Management | 2000

Benchmarking OODBS with a generic tool

Jérôme Darmont; Michel Schneider

We present in this paper a generic object-oriented benchmark (OCB: the Object Clustering Benchmark) that has been designed to evaluate the performances of Object-Oriented Databases (OODBs), and more specifically the performances of clustering policies within OODBs. OCB is generic because its sample database may be customized to fit any of the databases introduced by the main existing benchmarks, e.g., OO1 (Object Operation 1) or OO7. The first version of OCB was purposely clustering-oriented due to a clustering-oriented workload, but OCB has been thoroughly extended to be able to suit other purposes. Eventually, OCBâ€™s code is compact and easily portable. OCB has been validated through two implementations: one within the O2 OODB and another one within the Texas persistent object store. The performances of a specific clustering policy called DSTC (Dynamic, Statistical, Tunable Clustering) have also been evaluated with OCB.

Explore More