Lucian Popa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lucian Popa is active.

Explore More

Publication

Featured researches published by Lucian Popa.

very large data bases | 2002

Translating web data

Lucian Popa; Yannis Velegrakis; Mauricio A. Hernández; Renée J. Miller; Ronald Fagin

We present a novel framework for mapping between any combination of XML and relational schemas, in which a high-level, user-specified mapping is translated into semantically meaningful queries that transform source data into the target representation. Our approach works in two phases. In the first phase, the high-level mapping, expressed as a set of inter-schema correspondences, is converted into a set of mappings that capture the design choices made in the source and target schemas (including their hierarchical organization as well as their nested referential constraints). The second phase translates these mappings into queries over the source schemas that produce data satisfying the constraints and structure of the target schema, and preserving the semantic relationships of the source. Nonnull target values may need to be invented in this process. The mapping algorithm is complete in that it produces all mappings that are consistent with the schema constraints. We have implemented the translation algorithm in Clio, a schema mapping tool, and present our experience using Clio on several real schemas.

symposium on principles of database systems | 2003

Data exchange: getting to the core

Ronald Fagin; Phokion G. Kolaitis; Lucian Popa

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier article, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a “most general possible” solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a “best” universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, and has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. We investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P = NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data exchange, we identify natural and fairly broad conditions under which there are polynomial-time algorithms for computing the core of a universal solution. We also analyze the computational complexity of the following decision problem that underlies the computation of cores: given two graphs G and H, is H the core of G? Earlier results imply that this problem is both NP-hard and coNP-hard. Here, we pinpoint its exact complexity by establishing that it is a DP-complete problem. Finally, we show that the core is the best among all universal solutions for answering existential queries, and we propose an alternative semantics for answering queries in data exchange settings.

international conference on management of data | 2001

The Clio project: managing heterogeneity

Renée J. Miller; Mauricio A. Hernández; Laura M. Haas; Lingling Yan; C. T. Howard Ho; Ronald Fagin; Lucian Popa

Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we present the underlying themes of our approach and present a brief case study.

international conference on management of data | 2005

Clio grows up: from research prototype to industrial tool

Laura M. Haas; Mauricio A. Hernández; Howard Ho; Lucian Popa; Mary Tork Roth

Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBMs mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational schemas. Mappings are compiled into an abstract query graph representation that captures the transformation semantics of the mappings. The query graph can then be serialized into different query languages, depending on the kind of schemas and systems involved in the mapping. Clio currently produces XQuery, XSLT, SQL, and SQL/XML queries. In this paper, we revisit the architecture and algorithms behind Clio. We then discuss some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool.

international conference on management of data | 2004

Constraint-based XML query rewriting for data integration

Cong Yu; Lucian Popa

We study the problem of answering queries through a target schema, given a set of mappings between one or more source schemas and this target schema, and given that the data is at the sources. The schemas can be any combination of relational or XML schemas, and can be independently designed. In addition to the source-to-target mappings, we consider as part of the mapping scenario a set of target constraints specifying additional properties on the target schema. This becomes particularly important when integrating data from multiple data sources with overlapping data and when such constraints can express data merging rules at the target. We define the semantics of query answering in such an integration scenario, and design two novel algorithms, basic query rewrite and query resolution, to implement the semantics. The basic query rewrite algorithm reformulates target queries in terms of the source schemas, based on the mappings. The query resolution algorithm generates additional rewritings that merge related information from multiple sources and assemble a coherent view of the data, by incorporating target constraints. The algorithms are implemented and then evaluated using a comprehensive set of experiments based on both synthetic and real-life data integration scenarios.

Conceptual Modeling: Foundations and Applications | 2009

Clio: Schema Mapping Creation and Data Exchange

Ronald Fagin; Laura M. Haas; Mauricio A. Hernández; Renée J. Miller; Lucian Popa; Yannis Velegrakis

The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural schema mappings to describe the relationship between data in heterogeneous schemas, a new paradigm in which we view the mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange . In this chapter, we present our algorithms for both schema mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.

very large data bases | 2003

Mapping adaptation under evolving schemas

Yannis Velegrakis; Renée J. Miller; Lucian Popa

To achieve interoperability, modern information systems and e-commerce applications use mappings to translate data from one representation to another. In dynamic environments like the Web, data sources may change not only their data but also their schemas, their semantics, and their query capabilities. Such changes must be reflected in the mappings. Mappings left inconsistent by a schema change have to be detected and updated. As large, complicated schemas become more prevalent, and as data is reused in more applications, manually maintaining mappings (even simple mappings like view definitions) is becoming impractical. We present a novel framework and a tool (ToMAS) for automatically adapting mappings as schemas evolve. Our approach considers not only local changes to a schema, but also changes that may affect and transform many components of a schema. We consider a comprehensive class of mappings for relational and XML schemas with choice types and (nested) constraints. Our algorithm detects mappings affected by a structural or constraint change and generates all the rewritings that are consistent with the semantics of the mapped schemas. Our approach explicitly models mapping choices made by a user and maintains these choices, whenever possible, as the schemas and mappings evolve. We describe an implementation of a mapping management and adaptation tool based on these ideas and compare it with a mapping generation tool.

international conference on management of data | 2006

Query reformulation with constraints

Alin Deutsch; Lucian Popa; Val Tannen

Let Σ<inf>1</inf>, Σ<inf>2</inf> be two schemas, which may overlap, C be a set of constraints on the joint schema Σ<inf>1</inf> ∪ Σ<inf>2</inf>, and q<inf>1</inf> be a Σ<inf>1</inf>-query. An (equivalent) reformulation of q<inf>1</inf> in the presence of C is a Σ<inf>2</inf>-query, q<inf>2</inf>, such that q<inf>2</inf> gives the same answers as q<inf>1</inf> on any Σ<inf>1</inf> ∪ Σ<inf>2</inf>-database instance that satisfies C. In general, there may exist multiple such reformulations and choosing among them may require, for example, a cost model.

very large data bases | 2004

Preserving mapping consistency under schema changes

Yannis Velegrakis; J. Miller; Lucian Popa

Abstract.In dynamic environments like the Web, data sources may change not only their data but also their schemas, their semantics, and their query capabilities. When a mapping is left inconsistent by a schema change, it has to be detected and updated. We present a novel framework and a tool (ToMAS) for automatically adapting (rewriting) mappings as schemas evolve. Our approach considers not only local changes to a schema but also changes that may affect and transform many components of a schema. Our algorithm detects mappings affected by structural or constraint changes and generates all the rewritings that are consistent with the semantics of the changed schemas. Our approach explicitly models mapping choices made by a user and maintains these choices, whenever possible, as the schemas and mappings evolve. When there is more than one candidate rewriting, the algorithm may rank them based on how close they are to the semantics of the existing mappings.

international conference on management of data | 2008

Interactive generation of integrated schemas

Laura Chiticariu; Phokion G. Kolaitis; Lucian Popa

Schema integration is the problem of creating a unified target schema based on a set of existing source schemas that relate to each other via specified correspondences. The unified schema gives a standard representation of the data, thus offering a way to deal with the heterogeneity in the sources. In this paper, we develop a method and a design tool that provide: 1) adaptive enumeration of multiple interesting integrated schemas, and 2) easy-to-use capabilities for refining the enumerated schemas via user interaction. Our method is a departure from previous approaches to schema integration, which do not offer a systematic exploration of the possible integrated schemas. The method operates at a logical level, where we recast each source schema into a graph of concepts with Has-A relationships. We then identify matching concepts in different graphs by taking into account the correspondences between their attributes. For every pair of matching concepts, we have two choices: merge them into one integrated concept or keep them as separate concepts. We develop an algorithm that can systematically output, without duplication, all possible integrated schemas resulting from the previous choices. For each integrated schema, the algorithm also generates a mapping from the source schemas to the integrated schema that has precise information-preserving properties. Furthermore, we avoid a full enumeration, by allowing users to specify constraints on the merging process, based on the schemas produced so far. These constraints are then incorporated in the enumeration of the subsequent schemas. The result is an adaptive and interactive enumeration method that significantly reduces the space of alternative schemas, and facilitates the selection of the final integrated schema.

Explore More