Zoubida Kedad
Versailles Saint-Quentin-en-Yvelines University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zoubida Kedad.
Information and Database Quality | 2002
Mokrane Bouzeghoub; Zoubida Kedad
Data warehousing is a new technology which provides a software infrastructure for decision support systems and OLAP applications. Data warehouse systems collect data from heterogeneous and distributed sources, transform and reconcile this data in order to aggregate it and customize it with respect to business and organizational criteria required by decision makers. High level aggregated data is organized by subjects and stored as a multidimensional structure into a data mart. Data quality is very important in database applications in general and very crucial in data warehousing in particular. Indeed, data warehouse systems provide aggregated data to decision makers whose actions and decisions should be very strategic to the enterprise. Providing dirty data, imprecise data or non coherent data may lead to the rejection of the decision support system or may result into non productive decisions. This chapter provides a general framework for data warehouse design based on quality.
cooperative information systems | 1999
Zoubida Kedad; Mokrane Bouzeghoub
A Multi-Source Information System is composed of a set of independent data sources and a set of views defined as queries over these data sources. This problem is dramatically increased in evolving information systems, such as data warehouses and Web systems, where several views are daily defined or modified by users who are not aware of the detailed metadata describing data sources and their interrelationships. The purpose of the paper is to propose a design aid which helps in the definition of a user view (or query) from its schema and some integrity constraints. Our approach defines a solution space which provides the set of potential queries that correspond to the user view. The approach is based on the existence of metadata describing individual sources, on semantic assertions describing inter-source similarities between concepts, and on some heuristics which reduce the size of the solution space.
international conference on data engineering | 2010
Carlos Eduardo S. Pires; Paulo Sousa; Zoubida Kedad; Ana Carolina Salgado
Quickly understanding the content of a data source is very useful in several contexts. In a Peer Data Management System (PDMS), peers can be semantically clustered, each cluster being represented by a schema obtained by merging the local schemas of the peers in this cluster. In this paper, we present a process for summarizing schemas of peers participating in a PDMS. We assume that all the schemas are represented by ontologies and we propose a summarization algorithm which produces a summary containing the maximum number of relevant concepts and the minimum number of non-relevant concepts of the initial ontology. The relevance of a concept is determined using the notions of centrality and frequency. Since several possible candidate summaries can be identified during the summarization process, classical Information Retrieval metrics are employed to determine the best summary.
data warehousing and knowledge discovery | 2000
Mokrane Bouzeghoub; Zoubida Kedad
A data warehouse is a software infrastructure which supports OLAP applications by providing a collection of tools for data extraction and cleaning, data integration and aggregation, and data organization into multidimensional structures. At the design level, a data warehouse is defined as a hierarchy of view expressions whose ultimate nodes are queries on data sources. In this paper, we propose a logical model for a data warehouse representation which consists of a hierarchy of views, namely the base views, the intermediate views and the users views. This schema can be used for different design purposes, as the evolution of a data warehouse which is also the focus of this paper.
data and knowledge engineering | 1997
Elisabeth Métais; Zoubida Kedad; Isabelle Comyn-Wattiau; Mokrane Bouzeghoub
Abstract This paper addresses the problem of view integration in a CASE tool environment which is aiming at the elaboration of a conceptual schema of an application. The previous integration tools were mainly based on syntax and structure comparisons. A new generation of intelligent tools is now arising, assuming that view integration algorithms must also capture the deep semantics of the objects represented in the views. Dealing with the semantics of the objects is now a realistic objective, thanks to the research results obtained in the natural language area. This paper presents the definition of a view integration algorithm enhanced by the use of linguistic knowledge. This algorithm mainly consists of a semantic unification of views which are described using an extended entity-relationship model. It is combined with natural language techniques such as Fillmores semantic cases and Sowas conceptual graphs, supported by semantic dictionaries.
International Journal of Distributed Systems and Technologies | 2012
Carlos Eduardo S. Pires; Rocir Marcos Leite Santiago; Ana Carolina Salgado; Zoubida Kedad; Mokrane Bouzeghoub
Peer Data Management Systems PDMSs are advanced P2P applications in which each peer represents an autonomous data source making available an exported schema to be shared with other peers. Query answering in PDMSs can be improved if peers are efficiently disposed in the overlay network according to the similarity of their content. The set of peers can be partitioned into clusters, so as the semantic similarity among the peers participating into the same cluster is maximal. The creation and maintenance of clusters is a challenging problem in the current stage of development of PDMSs. This work proposes an incremental peer clustering process. The authors present a PDMS architecture designed to facilitate the connection of new peers according to their exported schema described by an ontology. The authors propose a clustering process and the underlying algorithm. The authors present and discuss some experimental results on peer clustering using the approach.
applications of natural language to data bases | 2002
Zoubida Kedad; Elisabeth Métais
Multi-source information systems, such as data warehouses, are composed of a set of heterogeneous and distributed data sources. The relevant information is extracted from these sources, cleaned, transformed and then integrated. The confrontation of two different data sources may reveal different kinds of heterogeneities: at the intensional level, the conflicts are related to the structure of the data. At the extensional level, the conflicts are related to the instances of the data. The process of detecting and solving the conflicts at the extensional level is known as data cleaning. In this paper, we will focus on the problem of differences in terminologies and we propose a solution based on linguistic knowledge provided by a domain ontology. This approach is well suited for application domains with intensive classification of data such as medicine or pharmacology. The main idea is to automatically generate some correspondence assertions between instances of objects. The user can parametrize this generation process by defining a level of accuracy expressed using the domain ontology.
international conference on move to meaningful internet systems | 2005
Zoubida Kedad; Xiaohui Xue
The interoperability of heterogeneous data sources is an important issue in many applications such as mediation systems or web-based systems. In these systems, each data source exports a schema and each application defines a target schema representing its needs. The way instances of the target schema are derived from the sources is described through mappings. Generating such mappings is a difficult task, especially when the schemas are semi structured. In this paper, we propose an approach for mapping generation in an XML context; the basic idea is to decompose the target schema into subtrees and to find mappings, called partial mappings, for each of them; the mappings for the whole target schema are then produced by combining the partial mappings and checking that the structure of the target schema is preserved. We also present a tool supporting our approach and some experimental results.
cooperative information systems | 2003
Mokrane Bouzeghoub; Bernadette Farias Lóscio; Zoubida Kedad; Ana Carolina Salgado
Previous works in data integration can be classified according to the approach used to define objects at the mediation level. One of these approaches is called global-as-view (GAV) and requires that each object is expressed as a view (a mediation query) on the data sources. One important limit of this approach is the management of the evolutions in the system. Indeed, each time a change occurs at the source schema level, all the queries defining the mediation objects have to be reconsidered and possibly redefined. In this paper, we propose an approach to cope with the evolution of mediation queries. Our claim is that if the definition of mediation queries in a GAV context follows a well-defined methodology, handling the evolution of the system becomes easier. These evolution problems are considered in the context of a methodology we have previously defined for generating mediation queries. Our solution is based on the concept of relevant relations on which propagation rules have been defined.
International Journal of Information Quality | 2011
Laure Berti-Equille; Isabelle Comyn-Wattiau; Mireille Cosquer; Zoubida Kedad; Sylvaine Nugier; Verónika Peralta; Samira Si-Said Cherfi; Virginie Thion-Goasdoué
Information quality is a complex and multidimensional notion. In the context of information system engineering, it is also a transversal notion and to be fully understood, it needs to be evaluated jointly considering the quality of data, the quality of the underlying conceptual data model and the quality of the software system that manages these data. This paper presents a multidimensional model for exploring information in a multidimensional way, which aids in the navigation, filtering, and interpretation of quality measures, and thus in the identification of the most appropriate actions to improve information quality. Two application scenarios are presented to illustrate and validate the multidimensional approach: the first one concerns the quality of customer information at Electricite de France, a French Electricity Company, and the second concerns the quality of patient records at Curie Institute, a well-known medical institute in France. The instantiation of our multidimensional model in these contexts shows first illustrations of its applicability.