Igor Tatarinov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Igor Tatarinov is active.

Explore More

Publication

Featured researches published by Igor Tatarinov.

international conference on data engineering | 2003

Schema mediation in peer data management systems

Alon Y. Halevy; Zachary G. Ives; Dan Suciu; Igor Tatarinov

Intuitively, data management and data integration tools should be well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backwards compatibility. As a result, many small-scale data sharing tasks are more easily facilitated by nondatabase-oriented tools that have little support for semantics. The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas. We consider the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas, which extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Finally, we describe several methods for optimizing the reformulation algorithm, and an initial set of experiments studying its performance.

international world wide web conferences | 2003

Piazza: data management infrastructure for semantic web applications

Alon Y. Halevy; Zachary G. Ives; Peter Mork; Igor Tatarinov

The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the worlds data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies.This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.

international conference on management of data | 2001

Updating XML

Igor Tatarinov; Zachary G. Ives; Alon Y. Halevy; Daniel S. Weld

As XML has developed over the past few years, its role has expanded beyond its original domain as a semantics-preserving markup language for online documents, and it is now also the de facto format for interchanging data between heterogeneous systems. Data sources expert XML “views” over their data, and other system can directly import or query these views. As a result, there has been great interest in languages and systems for expressing queries over XML data, whether the XML is stored in a repository or generated as a view over some other data storage format. Clearly, in order to fully evolve XML into a universal data representation and sharing format, we must allow users to specify updates to XML documents and must develop techniques to process them efficiently. Update capabilities are important not only for modifying XML documents, but also for propagating changes through XML view and for expressing and transmitting changes to documents. This paper begins by proposing a set of basic update operations for both ordered and unordered XML data. We next describe extensions to the proposed standard XML query language, XQuery, to incorporate the update operations. We then consider alternative methods for implementing update operations when the XML data is mapped into a relational database. Finally, we describe an experimental evaluation of the alternative techniques for implementing our extensions.

international conference on management of data | 2004

Efficient query reformulation in peer data management systems

Igor Tatarinov; Alon Y. Halevy

Peer data management systems (PDMS) offer a flexible architecture for decentralized data sharing. In a PDMS, every peer is associated with a schema that represents the peers domain of interest, and semantic relationships between peers are provided locally between pairs (or small sets) of peers. By traversing semantic paths of mappings, a query over one peer can obtain relevant data from any reachable peer in the network. Semantic paths are traversed by reformulating queries at a peer into queries on its neighbors.Naively following semantic paths is highly inefficient in practice. We describe several techniques for optimizing the reformulation process in a PDMS and validate their effectiveness using real-life data sets. In particular, we develop techniques for pruning paths in the reformulation process and for minimizing the reformulated queries as they are created. In addition, we consider the effect of the strategy we use to search through the space of reformulations. Finally, we show that pre-computing semantic paths in a PDMS can greatly improve the efficiency of the reformulation process. Together, all of these techniques form a basis for scalable query reformulation in PDMS.To enable our optimizations, we developed practical algorithms, of independent interest, for checking containment and minimization of XML queries, and for composing XML mappings.

international conference on management of data | 2003

The Piazza peer data management project

Igor Tatarinov; Zachary G. Ives; Jayant Madhavan; Alon Y. Halevy; Dan Suciu; Nilesh N. Dalvi; Xin Dong; Yana Kadiyska; Gerome Miklau; Peter Mork

A major problem in todays information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing heterogeneous data in a distributed and scalable way. Piazza assumes the participants to be interested in sharing data, and willing to define pairwise mappings between their schemas. Then, users formulate queries over their preferred schema, and a query answering system expands recursively any mappings relevant to the query, retrieving data from other peers. In this paper, we provide a brief overview of the Piazza project including our work on developing mapping languages and query reformulation algorithms, assisting the users in defining mappings, indexing, and enforcing access control over shared data.

international conference on management of data | 2001

A general technique for querying XML documents using a relational database system

Jayavel Shanmugasundaram; Eugene J. Shekita; Jerry Kiernan; Rajasekar Krishnamurthy; Efstratios Viglas; Jeffrey F. Naughton; Igor Tatarinov

There has been recent interest in using relational database systems to store and query XML documents. Each of the techniques proposed in this context works by (a) creating tables for the purpose of storing XML documents (also called relational schema generation), (b) storing XML documents by shredding them into rows in the created tables, and (c) converting queries over XML documents into SQL queries over the created tables. Since relational schema generation is a physical database design issue -- dependent on factors such as the nature of the data, the query workload and availability of schemas -- there have been many techniques proposed for this purpose. Currently, each relational schema generation technique requires its own query processor to efficiently convert queries over XML documents into SQL queries over the created tables. In this paper, we present an efficient technique whereby the same query-processor can be used for all such relational schema generation techniques. This greatly simplifies the task of relational schema generation by eliminating the need to write a special-purpose query processor for each new solution to the problem. In addition, our proposed technique enables users to query seamlessly across relational data and XML documents. This provides users with unified access to both relational and XML data without them having to deal with separate databases.

very large data bases | 2005

Schema mediation for large-scale semantic data sharing

Y. Halevy; G. Ives; Dan Suciu; Igor Tatarinov

Abstract.Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers’ schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers’ individual schemas.This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.

Journal of Web Semantics | 2004

Piazza: Mediation and Integration Infrastructure for Semantic Web Data

Zachary G. Ives; Alon Y. Halevy; Peter Mork; Igor Tatarinov

The SemanticWeb envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to provide interoperability both between sites with different terminologies and with existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the worlds data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the system, which addresses these challenges. Piazza offers a language for mediating between data sources on the SemanticWeb, and it maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.

very large data bases | 2004

Containment of nested XML queries

Xin Dong; Alon Y. Halevy; Igor Tatarinov

Query containment is the most fundamental relationship between a pair of database queries: a query Q is said to be contained in a query Q′ if the answer for Q is always a subset of the answer for Q′, independent of the current state of the database. Query containment is an important problem in a wide variety of data management applications, including verification of integrity constraints, reasoning about contents of data sources in data integration, semantic caching, verification of knowledge bases, determining queries independent of updates, and most recently, in query reformulation for peer data management systems. Query containment has been studied extensively in the relational context and for XPath queries, but not for XML queries with nesting. We consider the theoretical aspects of the problem of query containment for XML queries with nesting. We begin by considering conjunctive XML queries (c-XQueries), and show that containment is in polynomial time if we restrict the fanout (number of sibling sub-blocks) to be 1. We prove that for arbitrary fanout, containment is coNP-hard already for queries with nesting depth 2, even if the query does not include variables in the return clauses. We then show that for queries with fixed nesting depth, containment is coNP-complete. Next, we establish the computational complexity of query containment for several practical extensions of c-XQueries, including queries with union and arithmetic comparisons, and queries where the XPath expressions may include descendant edges and negation. Finally, we describe a few heuristics for speeding up query containment checking in practice by exploiting properties of the queries and the underlying schema.

international conference on management of data | 2000

Self-organizing data sharing communities with SAGRES

Zachary G. Ives; Alon Y. Levy; Jayant Madhavan; Rachel Pottinger; Stefan Saroiu; Igor Tatarinov; Shiori Betzler; Qiong Chen; Ewa Jaslikowska; Jing Su; Wai Tak Theodora Yeung

Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.

Explore More