Zijing Tan
Fudan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zijing Tan.
web age information management | 2007
Zijing Tan; Zijun Zhang; Wei Wang; Baile Shi
An XML document is inconsistent if it violates predefined integrity constraints. In this paper, we consider how to compute repairs for an inconsistent XML document. Here repair is defined as the data consistent with the integrity constraints, and also minimally differs from the original document. Based on a repair framework by introducing a chase method, in this paper, we discuss the repairs computing problem and implement a prototype. First we discuss some key points about mends generation and repairs chasing. Next we give a cost model for this repair framework, which can be used to evaluate the cost of each repair. Finally we implement prototypes of our method, and evaluate our framework and algorithms in the experiment.
computer and information technology | 2005
Zijing Tan; Jianjun Xu; Wei Wang; Baile Shi
This paper studies the XML storage in relations. Unlike traditional techniques, it considers the semantics expressed by functional dependencies. We propose an algorithm for mapping DTD to relational schema, which preserves not only the content and structure but also the semantics of original XML documents. To tackle the problem of constraints expression, we introduce a way to define functional dependencies and normalization for DTD. In a normalized DTD, every constraints expressed by functional dependencies can be concluded to keys. So we use the key definitions for XML as the foundation for relation generation, and maintain the keys in relations. After investigating the relationship between functional dependencies in XML documents with the corresponding ones in relations, we further prove that, if the original DTD is normalized, the generated relations will be in BCNF. So our method keeps the good properties of normalized DTD, and can fully leverage the relational technology
database systems for advanced applications | 2011
Zijing Tan; Liyong Zhang
We study the problem of repairing XML functional dependency violations by making the smallest value modifications in terms of repair cost. Our cost model assigns a weight to each leaf node in the XML document, and the cost of a repair is measured by the total weight of the modified nodes. We show that it is beyond reach in practice to find optimum repairs: this problem is already NP-complete for a setting with a fixed DTD, a fixed set of functional dependencies, and equal weights for all the nodes in the XML document. To this end we provide an efficient two-step heuristic method to repair XML functional dependency violations. First, the initial violations are captured and fixed by leveraging the conflict hypergraph. Second, the remaining conflicts are resolved by modifying the violating nodes and their related nodes called determinants, in a way that guarantees no new violations. The experimental results demonstrate that our algorithm scales well and is effective in improving data quality.
international multi symposiums on computer and computational sciences | 2006
Zijing Tan; Wei Wang; Baile Shi
Data may contain inconsistencies that violate integrity constraints, the consistent query answering problem attempts to find answers common for every possible repair. In this paper, we study how to handle the inconsistent XML document, which conforms to the DTD, while violates constraints. We consider three types of constraints, including functional dependencies, keys and foreign keys. We provide a repair framework for inconsistent XML document with three basic update operations: node insertion, node deletion and value modification. Following this approach, we introduce the concept of labelled XML document, which is an extension to the traditional tree model of XML with a function to indicate the unreliable nodes. We then give the definition of minimal labelled XML document, a representation of all possible repairs and can be used for consistent query answering. We provide a method for building an extended bottom-up tree automaton. The automaton can, in only one pass, not only check the validity of an XML document w.r.t DTD and integrity constraints, but also generate the corresponding minimal labelled XML document
Journal of Systems and Software | 2010
Zijing Tan; Chengfei Liu; Wei Wang; Baile Shi
When data sources are virtually integrated, there is no common and centralized method to maintain global consistency, so inconsistencies with regard to global integrity constraints are very likely to occur. In this paper, we consider the problem of defining and computing consistent query answers when queries are posed to virtual XML data integration systems, which are specified following the local-as-view approach. We propose a powerful XML constraint model to define global constraints, which can express keys and functional dependencies, and which also extends the newly introduced conditional functional dependencies to XML. We provide an approach to defining XML views, which supports not only edge-path mappings but also data-value bindings to express the join operator. We give formal definitions of repair and consistent query answers with the XML data integration settings. Given a query on the global system, we present a two-step method to compute consistent query answers. First, the given query is transformed using the global constraints, such that to run the transformed query on the original global system will generate exactly the consistent query answers. Because the global instance is not materialized, the query on the global instance is then rewritten in the form of queries on the underlying data sources by reversing rules in view definitions. We illustrate that the XPath query transformations can be implemented in XQuery. Finally, we implement prototypes of our method and evaluate our algorithms in the experiments.
Information Processing and Management | 2013
Zijing Tan; Liyong Zhang; Wei Wang; Baile Shi
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema, by following a mapping between the two schemas. There is a rich literature on problems related to data exchange, e.g., the design of a schema mapping language, the consistency of schema mappings, operations on mappings, and query answering over mappings. Data exchange is extensively studied on relational model, and is also recently discussed for XML data. This article investigates the construction of target instance for XML data exchange, which has received far less attention. We first present a rich language for the definition of schema mappings, which allow one to use various forms of document navigation and specify conditions on data values. Given a schema mapping, we then provide an algorithm to construct a canonical target instance. The schema mapping alone is not adequate for expressing target semantics, and hence, the canonical instance is in general not optimal. We recognize that target constraints play a crucial role in the generation of good solutions. In light of this, we employ a general XML constraint model to define target constraints. Structural constraints and keys are used to identify a certain entity, as rules for data merging. Moreover, we develop techniques to enforce non-key constraints on the canonical target instance, by providing a chase method to reason about data. Experimental results show that our algorithms scale well, and are effective in producing target instances of good quality.
advanced information networking and applications | 2008
Zijing Tan; Wei Wang; Baile Shi
In this paper, we study the problem of making use of target constraints to integrate XML data from different sources under a target schema. We recognize that target constraints are necessary in data integration, as the constraints are essential part of data semantics, and should be satisfied by integrated data. When integrating data from multiple data sources with overlapping data, constraints can express data merging rules at the target as well. We give a general constraint model for XML to express target constraints, which extends the relational equality-generating and tuple- generating dependencies. We provide a chase method to reason about data in the integrated XML document based on target constraints, by inferring data values not given explicitly, and inserting new subtrees as necessary. Singleton and key constraints are used to uniquely specify a certain entity, as a rule for data merging in the integration.
database systems for advanced applications | 2014
Chu He; Zijing Tan; Qing Chen; Chaofeng Sha; Zhihui Wang; Wei Wang
In practice, data are often found to violate functional dependencies, and are hence inconsistent. To resolve such violations, data are to be restored to a consistent state, known as “repair”, while the number of possible repairs may be exponential. Previous works either consider optimal repair computation, to find one single repair that is (nearly) optimal w.r.t. some cost model, or discuss repair sampling, to randomly generate a repair from the space of all possible repairs.
database systems for advanced applications | 2009
Zijing Tan; Chengfei Liu; Wei Wang; Baile Shi
We consider the problem of using query transformation to compute consistent answers when queries are posed to virtual XML data integration systems, which are specified following the local-as-view approach. This is achieved in two steps. First the given query is transformed to a new query with global constraints considered, then the new query is rewritten to queries on the underlying data sources by reversing rules in view definitions. The XPath query on the global system can be transformed in XQuery. We implement prototypes of our method, and evaluate our framework and algorithms in the experiment.
database systems for advanced applications | 2018
Yu Qiu; Zijing Tan; Kejia Yang; Weidong Yang; Xiangdong Zhou; Naiwang Guo
Lexicographical order dependencies (ODs) are proposed to describe the relationships between two lexicographical ordering specifications with respect to lists of attributes, and are proved to be useful in query optimizations concerning ordered attributes. To take full advantage of ODs, the data instance is supposed to satisfy OD specifications. In practice, data are often found to violate given ODs, as demonstrated in recent studies on discovery of ODs. This highlights the quest for data repairing techniques for ODs, to restore consistency of the data with respect to ODs. New challenges arise since ODs convey order semantics beyond functional dependencies, and are specified on lists of attributes. In this paper, we make a first effort to develop techniques for repairing data violations with ODs. (1) We formalize the data repairing problem for ODs, and prove that it is NP-hard in the size of the data. (2) Despite the intractability, we develop effective heuristic algorithms to address the problem. (3) We experimentally evaluate the effectiveness and efficiency of our algorithms, using both real-life and synthetic data.