Zoé Lacroix
Arizona State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zoé Lacroix.
conference on information and knowledge management | 2001
Ying Guang Li; Stéphane Bressan; Gillian Dobbie; Zoé Lacroix; Mong Li Lee; Ullas Nambiar; Bimlesh Wadhwa
If XML is to play the critical role of the lingua franca for Internet data interchange that many predict, it is necessary to start designing and adopting benchmarks allowing the comparative performance analysis of the tools being developed and proposed. The effectiveness of existing XML query languages has been studied by many, with a focus on the comparison of linguistic features, implicitly reflecting the fact that most XML tools exist only on paper. In this paper, with a focus on efficiency and concreteness, we propose a pragmatic first step toward the systematic benchmarking of XML query processing platforms with an initial focus on the data (versus document) point of view. We propose XOO7, an XML version of the OO7 benchmark. We discuss the applicability of XOO7, its strengths, limitations and the extensions we are considering. We illustrate its use by presenting and discussing the performance comparison against XOO7 of three different query processing platforms for XML.
international conference of the ieee engineering in medicine and biology society | 2002
Zoé Lacroix
Scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data access, analysis, and visualization tools. Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web as well as data generated by software. We present an approach to wrapping web data sources, databases, flat files, or data generated by tools through a database view mechanism. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, second builds the expected output with respect to the virtual structure. Our wrappers are composed of a retrieval component based on an intermediate object view mechanism called search views mapping the source capabilities to attributes, and an Extensible Markup Language (XML) engine, respectively, to perform these two tasks. The originality of the approach consists of: 1) a generic view mechanism to access seamlessly data sources with limited capabilities and 2) the ability to wrap data sources as well as the useful specific tools they may provide. Our approach has been developed and demonstrated as part of the multidatabase system supporting queries via uniform object protocol model (OPM) interfaces.
advances in geographic information systems | 2002
Omar Boucelma; Mehdi Essid; Zoé Lacroix
The proliferation of spatial data on the Internet is beginning to allow a much wider access to data currently available in various Geographic Information Systems (GIS). In order to move to a real Web-based community where geographical data can be accessed and exchanged, we need to provide flexible and powerful GIS data integration solutions. Indeed, GIS are highly heterogeneous: not only they differ by their data representations, but they also offer radically different query languages. A GIS mediation approach should provide (1) an integrated view of the data supplied by all sources, and (2) a geographical query language to access and manipulate integrated data.In this paper we propose an approach that not only focuses on the data integration, but also addresses the integration of query capabilities available at the sources. A GIS may provide a query capability inexistent at another GIS, whereas two query capabilities may be similar but with a slightly different semantics. We introduce the notion of derived wrappers that capture additional query capabilities to either compensate capabilities lacking at a source, or to adjust an existing capability in order to make it homogeneous with other similar capabilities, wrapped at other sources. Finally we describe the implementation of the presented approach that complies with OpenGIS WFS recommendation.
bioinformatics and bioengineering | 2001
Barbara A. Eckman; Zoé Lacroix; Louiqa Raschid
Today, scientific data is inevitably digitized, stored in a variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further analysis and visualization to support the task of scientific discovery. Building a digital library for scientific discovery requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that is locally materialized in warehouses or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges to be addressed include capturing and representing source capabilities; developing a methodology to acquire and represent metadata about source contents and access costs; and decision support to select sources and capabilities using cost based and semantic knowledge, and generating low cost query evaluation plans.
data integration in the life sciences | 2004
Zoé Lacroix; Louiqa Raschid; Maria-Esther Vidal
Life science data sources represent a complex link-driven federation of publicly available Web accessible sources. A fundamental need for scientists today is the ability to completely explore all relationships between scientific classes, e.g., genes and citations, that may be retrieved from various data sources. A challenge to such exploration is that each path between data sources potentially has different domain specific semantics and yields different benefit to the scientist. Thus, it is important to efficiently explore paths so as to generate paths with the highest benefits. In this paper, we explore the search space of paths that satisfy queries expressed as regular expressions. We propose an algorithm ESearch that runs in polynomial time in the size of the graph when the graph is acyclic. We present expressions to determine the benefit of a path based on metadata (statistics). We develop a heuristic search OnlyBestXX%. Finally, we compare OnlyBestXX% and ESearch.
IEEE Internet Computing | 2002
Ullas Nambiar; Zoé Lacroix; Stéphane Bressan; Mong Li Lee; Yingguang Li
The Extensible Markup Language has become the standard for information interchange on the Web. We study the data- and document-centric uses of XML management systems (XMLMS). We want to provide XML data users with a guideline for choosing the data management system that best meets their needs. Because the systems we test are first-generation approaches, we suggest a hypothetical design for a useful XML database that could use all the expressive power of XML and XML query languages.
very large data bases | 2003
Stéphane Bressan; Mong Li Lee; Ying Guang Li; Zoé Lacroix; Ullas Nambiar
As XML becomes the standard for electronic data interchange, benchmarks are needed to provide a comparative performance analysis of XML Management Systems (XMLMS). Typically a benchmark should adhere to four criteria: relevance, portability, scalability and simplicity [1]. The data structure of a benchmark for XML must be complex enough to capture the characteristics of XML data representation. Data sets should be in various sizes. Benchmark queries should only be defined with the primitives of the language.
data integration in the life sciences | 2004
Zoé Lacroix; Hyma Murthy; Felix Naumann; Louiqa Raschid
An abundance of biological data sources contain data on classes of scientific entities, such as genes and sequences. Logical relationships between scientific objects are implemented as URLs and foreign IDs. Query processing typically involves traversing links and paths (concatenation of links) through these sources. We model the data objects in these sources and the links between objects as an object graph. Analogous to database cost models, we use samples and statistics from the object graph to develop a framework to estimate the result size for a query on the object graph.
data and knowledge engineering | 2005
Stéphane Bressan; Barbara Catania; Zoé Lacroix; Ying Guang Li; Anna Maddalena
Some XML query processors operate on an internal representation of XML documents and can leverage neither the XML storage structure nor the possible access methods dedicated to this storage structure. Such query processors are often used in organizations that usually process transient XML documents received from other organizations. In this paper, we propose a different approach to accelerating query execution on XML source documents in such environments. The approach is based on the notion of query equivalence of XML documents with respect to a query. Under this equivalence, we propose two different document transformation strategies which prune parts of the documents irrelevant to the query, just before executing the query itself. The proposed transformations are implemented and evaluated using a two-level index structure: a structural directory capturing document paths and an inverted index of tag offsets.
statistical and scientific database management | 2006
Pierre Tufféry; Zoé Lacroix; Hervé Ménager
We present a semantic map of resources for structural bioinformatics applied to proteins, i.e., various methods to predict and analyze protein structures in silico. Our map depicts resources on two levels: a logical level that provides a high-level description of the scientific concepts using a domain ontology; a physical level, that describes the actual resources implementing these connections. Scientists can use our system to express a query that captures their scientific aim, and are guided to identify the resources best meeting their needs. It is intended to provide scientists a tool to register and share knowledge about the available services in this field. Our approach addresses the problem of semantic interoperability of scientific resources publicly available on the Web