Towards declarative comparabilities: application to functional dependencies
PPossible/Certain Functional Dependencies
Lhouari Nourine ∗ Jean Marc Petit † April 6, 2020
Abstract
Incomplete information allow to deal with data with errors, uncertainties orinconsistencies and have been studied in different application areas such asquery answering or data integration.In this paper, we investigate classical functional dependencies in presence ofincomplete information. To do so, we associate each attribute with a compa-rability function which maps every pair of domain values to abstract values ,assumed to be organized in a distributive lattice. Thus, every relation schemahas an associated product lattice from which we define abstract functional de-pendencies over abstract tuples , leading to reasoning in a multi-valued logic.In this setting, we revisit classical notions like soundness and completeness ofArmstrong axioms, attribute set closure, implication problem and give asso-ciated results.We also focus on the interpretations of abstract values in true/false logic todefine the notion of reality which corresponds to a { , } -embedding of theproduct lattice. Based on this semantic, we introduce the notions of possible(there exists one reality in which the given FD holds) and certain (for every re-ality, the given FD holds) functional dependencies. We show that the problemof checking if a functional dependency is certain can be solved in polynomialtime, whereas the problem of checking if a FD is possible is NP-Complete. Wealso identify tractable cases depending on lattices properties. Dealing with incomplete information is one of the oldest database research topics [IL84],which appears in many important application domains such as query answering [AHV95,GPW14], data integration [Len02], inconsistent databases [Ber11] or probabilistic data[SORK11]. For instance, query answering in presence of incomplete information has ledto the notion of certain answers , i.e. answers belonging to every possible minimal databaserepair, i.e. a database where incomplete information have been removed. Whenever largeamounts of heterogeneous data have to be handled, incomplete information may appearunder different forms, for example as null values [Lib16], aberrant values, values withuncertainty, orderings . . . .Data dependencies theory turns out to be another application area of incomplete in-formation since they allow to deal with data with errors, uncertainty or inconsistencies(see for example [CDP16] for a survey). In this paper, we investigate classical functionaldependencies in presence of incomplete information. To the best of our knowledge, this ∗ Université Clermont-Auvergne, France, LIMOS, UMR 6158 CNRS † Université de Lyon, INSA Lyon, France, LIRIS, UMR 5205 CNRS a r X i v : . [ c s . L O ] A p r ross-fertilization between data dependencies and incomplete information has not receivedmuch attention.A simple yet fundamental database question underlying incomplete information anddata dependencies is the following: given two values u and v , how to decide whether ornot u = v , or at least to decide under which assumptions those values are close enoughto be considered as equal. Clearly, there does not exist a unique answer to this question,many interpretations being indeed possible. This problem can be studied by defining twofunctions f and h : f ( u, v ) to evaluate the proximity of u and v with a new value (e.g. acategorical label, a percentage, a belief ....) and h ( f ( u, v )) to say whether f ( u, v ) is true(considered as equal) or false. For instance with numerical values, f can be defined as f ( u, v ) = | u − v | and h ( f ( u, v )) = 1 if f ( u, v ) ≤ (cid:15) , where (cid:15) is a given a threshold value(see other examples in [CDP16]).In this setting, we propose a lattice-based approach to deal with incomplete infor-mation, from which we define abstract functional dependencies . Their meaning relies ondifferent interpretations of equality, leading to possible and certain functional dependen-cies , in the same spirit underlying the definition of certains answers in query answering ordata integration.We define the following steps: First, we associate each attribute with a comparabilityfunction which maps every pair of domain values to abstract values , assumed to be orga-nized in a distributive lattice. In other words, f maps every pair of domain values to atruth value, belonging to a distributive lattice L such that for every a, b ∈ L , a ≤ b meansthat the truth of b is “stronger” than or equal to the truth of a .Such a couple (comparability function, lattice structure) has to be defined with domainexperts for each attribute, to catch as finely as possible the intended meaning of possiblecomparisons. Example . Let us consider the relation r given in Figure 1.(a), which will be reusedthroughout the paper.From its background knowledge, let us assume a domain expert has provided thecomparability functions f A , f B , f C defined in Table 1.For each attribute, the lattice of their abstract values is represented in Figure 1.(b)with the following abbreviations of abstract values: G stands for Good , B for Bad , GB for Good or Bad , U for U nknown , C for Correct , I for Incorrect , T for T rue , and D for Dif f erent .Second, it follows that every relation schema has an associated lattice from which wedefine abstract functional dependencies , leading to reasoning in a multi-valued logic. It isworth noting that implications can be defined on the lattice associated with a relation. Wecall such implications abstract functional dependencies . Moreover, given a finite relation,there exists a finite set of abstract tuples , obtained by applying on every pair of tuples ofthe relation, comparability functions on each attribute.
Example . From Example 1, tuples t = (cid:104) . , f, . (cid:105) and t = (cid:104) . , f, null (cid:105) give raiseto the abstract tuple (cid:104) f A (1 . , . , f B ( f, f ) , f C (10 . , null ) (cid:105) = (cid:104) G, T, U (cid:105) . In Annexe 1, theset of abstract tuples from r is given.For the relation r , Figure 1.(c) gives a graphical representation of the product latticeassociated with lattices of each attribute (for the sake of clearness, delimiters of elementsare sometimes omitted, e.g. (cid:104) U, T, C (cid:105) is denoted by
U, T, C ). In this setting, (cid:104)
GB, T, U (cid:105) ! (cid:104) G, T, U (cid:105) is an example of abstract functional dependency.Third, to make sense of such abstract functional dependencies, we define their seman-tics with interpretations of abstract values in { , } , leading to attribute sets from abstract2igure 1: Running exampletuples (characteristic vectors) and to functional dependencies from abstract functional de-pendencies. We point out that whenever an interpretation is an homomorphism, so-called realities , then the embedding of abstract domains to the boolean domain still define alattice. The lattices of the attributes must be distributive to guarantee the existence of atleast one homomorphism when we interpret them in { , } . This ensures both that thereare realities and that abstract DFs can be interpreted. Example . Figure 1.(d1) and Figure 1.(d2) give two illustrations of three interpretationsfor
A, B and C , represented by ( h A , h B , h C ) and ( h (cid:48) A , h (cid:48) B , h (cid:48) C ) respectively. They definetwo realities, allowing to decide whether or not a given abstract value is true. Figure 1.(e)is the lattice obtained from Figure 1.(c) by applying the reality ( h A , h B , h C ) . For example, (cid:104) U, D, C (cid:105) is interpreted as (cid:104) , , (cid:105) , represented with AC in Figure 1.(e).Since many realities do exist, each of them defines a semantics of abstract functionaldependencies. As a consequence, we define possible and certain functional dependencies ,i.e. those dependencies that hold in at least one reality and all possible realities respec-tively. Example . The functional dependency A ! C is possible in r since the reality given inFigure 1.(d1) makes A ! C satisfied in r . Nevertheless A ! C is not certain in r sincewith the reality given in Figure 1.(d2), tuples ( t , t ) form a counter example of A ! C .To sum up, we propose a simple and elegant formalism allowing to define possibleand certain functional dependencies in presence of incomplete information. We make thefollowing contributions: • We deal with incomplete information by associating to every attribute a compara-bility function and by requiring a distributive lattice structure over abstract valuesgiven by the comparability function. 3able 1: Examples of comparability functions for example 1 f A ( u, v ) = Good if ( u = v ) or ( u, v ∈ [0 , Good or Bad if u, v ∈ [4 , , u (cid:54) = v Unknown if ( u, v ) or ( v, u ) ∈ [0 , × [4 , Bad otherwise. f B ( u, v ) = (cid:40) True if u = v Different if u (cid:54) = vf C ( u, v ) = Correct if ( u = v (cid:54) = N U LL ) or ( u, v ∈ [4 , or u, v ∈ [10 , Unknown if u = N U LL or v = N U LL
Incorrect otherwise. • In the induced lattice of a relation, we propose abstract functional dependencies overabstract tuples, leading to reasoning in a multi-valued logic. Moreover, we revisitclassical notions like soundness and completeness of Armstrong axioms, attributeset closure, implication problem and give associated results for abstract functionaldependencies. • We propose the interpretation of abstract values into { , } with the notion of reality ,defining a new lattice if the interpretations satisfy the homomorphism property. • Given the exponential numbers of realities, we make a connection with certain an-swers in presence of incomplete information: We define possible functional depen-dencies (there exists one reality in which the given FD holds) and certain functionaldependencies (for every reality, the given FD holds). • We study two associated problems and show that checking whether a given FD ispossible is NP-Complete, whereas checking whether a given FD is certain is PTIME.We also identify tractable cases for possible functional dependencies.
Paper organisation
Section 2 gives the preliminaries and introduces the way we dealwith incomplete information with a lattice-based approach. Then abstract functional de-pendencies are defined and a new set of three axioms, called Extended Armstrong axioms,is proposed. The implication problem for abstract functional dependencies is then stud-ied. Section 3 gives a logical interpretation of abstract functional dependencies throughthe notion of reality. The homomorphism property is shown to be required to preservethe lattice structure. Section 4 defines possible and certain functional dependencies andshows the main results of the paper. Section 5 is devoted to related work, and section 6concludes the paper and opens some perspectives of this work.4
Abstract functional dependencies
A relation schema R is defined over n attributes, denoted by R = { A , . . . , A n } . Eachattribute A ∈ R has a domain, denoted by dom ( A ) . Without loss of generality, we supposethat all attributes are defined over the same domain D , i.e. dom ( A ) = D , for every A ∈ R .Given a relation schema R with n attributes, D n is the set of possible tuples over R . Arelation r over R is a set of tuples over R , each tuple is written < v , . . . , v n > with v i ∈ D , i ∈ ..n .A partial order on a set X (or poset ) is a binary relation ≤ on X which is reflexive,anti-symmetric and transitive, denoted by P = ( X, ≤ ) . Two elements x and y of P aresaid to be comparable if x ≤ y or y ≤ x , incomparable otherwise. An element y covers anelement x (or x is covered by y ) if there is no others element z for which x ≤ z ≤ y . If anelement u of P is such that both x ≤ u and y ≤ u then u is called upper bound of x and y ; it is called least upper bound of x and y if moreover u ≤ v for every upper bound v of x and y . Note that two elements of a poset may or may not have a least upper bound. Theleast upper bound (also known as supremum or join ) of x and y , if it exists, is denotedby x ∨ y . Dually, an element u that is such that both u ≤ x and u ≤ y is called lowerbound of x and y ; it is called greatest lower bound of x and y if moreover v ≤ u for everylower bound v of x and y . The greatest lower bound (also known as infimum or meet ) of x and y , if it exists, is denoted by x ∧ y . A lattice is a poset in which every two elementshave a join and a meet ; see [DP02, Grä11]. A lattice is denoted indifferently as ( L , ≤ ) or L ( ∧ , ∨ ) . The top (resp. the bottom) of a lattice is denoted by (cid:62) (resp. ⊥ ). A lattice isdistributive if for all x, y, z in L : x ∧ ( y ∨ z ) = ( x ∧ y ) ∨ ( x ∧ z ) .Let ( L , ≤ ) be a lattice, v ∈ L and X ⊆ L . An implication X ! v in L is defined asusual: X ! v holds in L iff v ≤ (cid:87) X . Note that if v ≤ u with u ∈ X then X ! v holdsin L ( trivial implication) .Let L ( ∧ , ∨ ) and K ( ∧ (cid:48) , ∨ (cid:48) ) be two lattices and let g : L ! K . Then the mapping g isa homomorphism provided that for any x, y ∈ L , g ( x ∨ y ) = g ( x ) ∨ (cid:48) g ( y ) and g ( x ∧ y ) = g ( x ) ∧ (cid:48) g ( y ) . g is said to preserve join and meet. We first define the domain of abstract values of a given attribute A . The abstract domain of A , denoted by D A ( A ) , is a set of finite abstract values, disjoint from the domain D , i.e. D ∩ D A ( A ) = ∅ , representing comparabilities between every pair of domain values.Abstract domains of attributes are supposed to be disjoint, i.e. two different attributeshave different abstract values, even if they have the same domain.Moreover, we assume that a partial order ≤ A exists over D A ( A ) and the poset ( D A ( A ) , ≤ A ) is a distributive lattice, denoted by L A .We now define the explicit mapping of pairs of domain values to abstract values througha comparability function. A comparability function for an attribute A , denoted by f A ,is a map from D × D to D A ( A ) which subsumes equality (i.e. f A ( u, u ) = (cid:62) ) and iscommutative (i.e. f A ( u, v ) = f A ( v, u ) ). It is worth noting that f A is neither associative( f A ( f A ( u, v ) , w ) not defined) nor idempotent ( i.e.f A ( u, u ) (cid:54) = u ) .Comparaison maps extend naturally to tuples as follows: Let R = { A , . . . , A n } be arelation schema and D A ( A i ) the abstract domain of A i , i = 1 ..n . The abstract domain of R , denoted by D A ( R ) , is defined by D A ( R ) = (cid:81) A ∈ R D A ( A ) .5et t , t be two tuples over R . A comparability function for R , denoted by f R isdefined from D n × D n to D A ( R ) as follows: f R ( t , t ) = (cid:104) f A ( t [ A ] , t [ A ]) , . . . , f A n ( t [ A n ] , t [ A n ]) (cid:105) Note that f R ( t , t ) can be seen as a generalization of agree sets for FDs between twotuples [BDFS84].The product of lattices of attributes is the lattice L R = ( D A ( R ) , ≤ R ) where L R =Π A ∈ R L A and ≤ R = (cid:81) A ∈ R ≤ A .We now introduce attribute context and schema context to summarize our notations. Definition 2.1. An attribute context for an attribute A , denoted by C ( A ) , is representedby the pair { f A , L A } . A schema context for a relation schema R , denoted by C ( R ) , isdefined by C R = {C ( A ) , A ∈ R } . Note that from a schema context C ( R ) , we have both its associated comparabilityfunction f R and its abstract domain D A ( R ) . When clear from context, the subscript R will be omitted, i.e. we shall use ≤ , ∧ , ∨ instead of ≤ R , ∧ R , ∨ R respectively. In the sequel,a relation r will be defined over a schema context C R , instead of R .Consequently, every relation r over C R is associated with a sublattice L r of L R , definedby L r = { (cid:86) W | W ⊆ F r } where F r = { f R ( t , t ) | t , t ∈ r } is a generating family of L r .Hence L r is the closed family of F r by the meet operator ∧ of L R .From well known notions in lattice theory, e.g. implications and fix-point closure, wedefine abstract functional dependencies based on the underlying product of lattices. Definition 2.2. (Syntax) Let R be a relation schema, C ( R ) a schema context and D A ( R ) its co-domain. An abstract functional dependency over C R is an expression of the form C R : s ! u where s, u ∈ D A ( R ) (or simply s ! u whenever C R is clear from context). The satisfaction of an abstract functional dependency in a relation is defined as follows:
Definition 2.3. (Satisfaction) Let r be a relation over C R and s, u ∈ D A ( R ) . r satisfies s ! u with respect to C R , denoted by r | = C R s ! u , iff for all t, t (cid:48) ∈ r, if s ≤ f R ( t, t (cid:48) ) then u ≤ f R ( t, t (cid:48) ) Example . Continuing previous examples, let us consider three abstract functional de-pendencies: (cid:104)
B, T, C (cid:105) ! (cid:104) U, T, C (cid:105) , (cid:104) GB, T, C (cid:105) ! (cid:104) G, T, C (cid:105) and (cid:104)
GB, T, U (cid:105) ! (cid:104) G, T, C (cid:105) .One can easily verify that the two first ones are satisfied in r while the third one is notsatisfied, see for the counter-example ( t , t ) . Interestingly, well known Armstrong axioms can be extended in this setting quite directly,with the same three properties (reflexivity, augmentation and transitivity) no longer ex-pressed over attribute sets.Let Σ be a set of abstract functional dependencies over C R , and u, v, w ∈ D A ( R ) . Weconsider the three following extended Armstrong axioms:1. Let v ≤ u . Then Σ (cid:96) C R u ! v
2. if Σ (cid:96) C R u ! v then Σ (cid:96) C R u ∨ w ! v ∨ w
3. if Σ (cid:96) C R u ! v and Σ (cid:96) C R v ! w then Σ (cid:96) C R u ! w Σ (cid:96) C R u ! v and Σ (cid:96) C R u ! v (cid:48) then Σ (cid:96) C R u ! v ∨ v (cid:48) or if Σ (cid:96) C R u ! v ∨ v (cid:48) then Σ (cid:96) C R u ! v . Definition 2.5.
The closure of s with respect to Σ , denoted by s +Σ , is equal to: s +Σ = ∨{ u | Σ (cid:96) C R s ! u } Algorithm 1:
Closure algorithm
Input: Σ a set of implication over C R , u ∈ D A ( R ) Output:
The closure u Σ of u . begin u Σ = u ; while there exists w ! z ∈ Σ such that w ≤ u Σ do u Σ = u Σ ∨ z ; Σ = Σ \ { w ! s } ;The major difference between Algorithm 1 and classic algorithms for computing at-tribute closure is the elementary operations corresponding to ≤ and ∨ . We suppose thateach distributive lattice L A is encoded by sets such that the join operation ∨ is preservedby union [Mar80] and its cost is bounded by O ( k ) , where k is the size of the encoding. Theorem 2.6.
Algorithm 1 computes the closure of a tuple u ∈ D A ( R ) in O ( k.n. | Σ | ) timecomplexity, where k is the complexity of the join operation in lattice L A for each attribute A ∈ R . Since the proofs of this section differ only slightly from the proofs related to functionaldependencies, they are omitted but given in Annexe 2.We consider the classical implication problem defined as usual: Given a set of abstractfunctional dependencies Σ and u ! v an abstract functional dependency, does Σ implies u ! v ? We define the validity and the completeness as follows. Definition 2.7. Σ (cid:96) C R u ! v if there exists a derivation (or a proof ) of u ! v by usingthe extended Armstrong axioms. Definition 2.8. Σ | = C R u ! v if for every relation r , if r | = C R Σ then r | = C R u ! v Then we provide a connection between validity of an abstract FD and the comparabilityof its right-hand side with the closure of its left-hand side.
Proposition 2.9. Σ | = C R s ! u iff u ≤ s +Σ The following theorem shows that extended Armstrong axioms are valid and complete.
Theorem 2.10. Σ | = C R u ! v iff Σ (cid:96) C R u ! v In this section, we point out how to interpret abstract values, abstract tuples and abstractfunctional dependencies in the classical true/false logic. To do that, we introduce thenotion of interpretation as follows: 7 efinition 3.1.
Let R be a relation schema, A ∈ R and C ( A ) = { f A , L A } an attributecontext. An attribute context interpretation of A is a map h : L A ! { , } satisfyingthe following properties:1. x ≤ y in L A implies h ( x ) ≤ h ( y ) , ie. h is an increasing map.2. h ( (cid:62) ) (cid:54) = h ( ⊥ ) Definition 3.1 ensures that every interpretation must contain the false value (0) andthe true value (1) ( h ( (cid:62) ) = 1 and h ( ⊥ ) = 0 ). Moreover, the truth interpretation has to beincreasing. Example . Consider the attribute A whose context { f A , L A } is given in table 1 andin Figure 1.(b). Consider the interpretation h A for abstract values of A (see also Figure1.(d1)): • h A ( G ) = h A ( U ) = 1 and h A ( GB ) = h A ( B ) = 0 Now we can decide whether two A-values are similar or not, e.g. . and . are differentsince h A ( f A (1 . , . h A ( U ) = 0 whereas . and . are equal since h A ( f A (4 . , . h A ( GB ) = 1 .We extend attribute context interpretation to schema context interpretation as follows. Definition 3.3.
Let R = { A , . . . , A n } be a relation schema, C ( R ) a schema context and h i an attribute context interpretation for A i , i = 1 ..n . A schema context interpretation for R , denoted by g h ,...,h n (or simply g whenever ( h i ) i =1 ..n are clear from the context) isa map g : L R ! { , } n defined by g ( (cid:104) x , . . . , x n (cid:105) ) = (cid:104) h ( x ) , . . . , h n ( x n ) (cid:105) In the sequel, g | A will denote the restriction of g to an attribute A ∈ R . Note that the map g is an embeddingof the lattice L R into the boolean lattice, i.e.the lattice of subsets of the set R of attributes. Then each image of g is a characteristicvector of a set of attributes in R . In the sequel, we will use g interchangeably to denote aset or its characteristic vector. Clearly, it opens the possibility to come back on classicalinterpretation of functional dependencies in databases, by handling explicitly attributesets. Example . Continuing previous examples, consider the reality g h A ,h B ,h C given in Figure1.(d1). We have g h A ,h B ,h C ( (cid:104) B, D, I (cid:105) ) = (cid:104) , , (cid:105) = ∅ , g h A ,h B ,h C ( (cid:104) G, T, C (cid:105) ) = (cid:104) , , (cid:105) = { A, B, C } , g h A ,h B ,h C ( (cid:104) GB, D, U (cid:105) ) = (cid:104) , , (cid:105) = { C } .Now the main question is the following: Which property attribute context interpreta-tions has to be satisfied in order to preserve the lattice structure ? The intuition is thatgiven two incomparable elements u and v whose both interpretations are false (resp. true),their join u ∨ v (resp. x ∧ y ) has to be false (resp. true). The following results ensure that if every attribute context interpretation h i , i = 1 ..n is an homomorphism, then g h ,...,h n is also an homomorphism and consequently, for anyrelation, the lattice structure is preserved by g h ,...,h n . Theorem 3.5.
Let R be a relation schema, C ( R ) a schema context and g h ,...,h n a schemacontext interpretation. If h , . . . , h n are homomorphisms, then g : L r . ! { , } n is anhomomorphism. roof. We prove that g is a join-homomorphism. The same idea can be used to prove that g is a meet-homomorphism. Let x = (cid:104) x , . . . , x n (cid:105) , y = (cid:104) y , . . . , y n (cid:105) be two elements of L r . g ( x ∨ y )= g ( (cid:104) x , . . . , x n (cid:105) ∨ (cid:104) y , . . . , y n (cid:105) )= g ( (cid:104) x ∨ y , . . . , x n ∨ y n (cid:105) )= (cid:104) h ( x ∨ y ) , . . . , h n ( x n ∨ y n ) (cid:105) = (cid:104) h ( x ) ∨ h ( y ) , . . . , h n ( x n ) ∨ h n ( y n ) (cid:105) since h , . . . , h n are homomorphisms = (cid:104) h ( x ) . . . , h ( x n ) (cid:105) ∨ (cid:104) h n ( y ) . . . h n ( y n )) (cid:105) = g ( x ) ∨ g ( y ) .We now define the notion of reality . Definition 3.6.
Let R be a relation schema, C ( R ) a schema context and g a schemacontext interpretation. g is a reality if g is an homomorphism. For every relation, the structure induced by a reality still defines a lattice as shown inthe following proposition.
Proposition 3.7.
Let r be a relation over C R , L r the lattice associated with r and g areality. Then g ( L r ) is a lattice.Proof. Since g is an homomorphism, join and meet exist for every couple of elements of L r . and thus g ( L r ) is a lattice.Clearly, every reality g gives a semantics to any abstract functional dependencies.Consider an abstract functional dependency u ! v that holds in a relation r and somereality g . From a semantic point of view, g ( u ) ! g ( v ) holds also in r , capturing theintuition that some form of functional dependency g ( u ) ! g ( v ) is satisfied in r , since g ( u ) and g ( v ) are subsets of R . More precisely, we have: Definition 3.8.
Let R be a relation schema, C ( R ) a schema context, g a reality and u ! v an abstract functional dependency. g ( u ) ! g ( v ) is satisfied in r , denoted by r | = C R ,g g ( u ) ! g ( v ) iff for all t, t (cid:48) ∈ r , if for all A ∈ g ( u ) , g | A ( f A ( t [ A ] , t (cid:48) [ A ])) = 1 , thenfor all B ∈ g ( v ) , g | B ( f B ( t [ B ] , t (cid:48) [ B ])) = 1 In the following proposition, we point out that whenever a given abstract FD holds ina relation, then for every reality, the corresponding FD over attributes sets also holds inthe relation.
Proposition 3.9.
Let r be a relation over C ( R ) , u, v ∈ D A ( R ) such that r | = C R u ! v and g a reality. Then r | = C R ,g g ( u ) ! g ( v ) Proof.
Since r | = C R u ! v , we have u ≤ v . Then v = u ∨ v and g ( v ) = g ( u ∨ v ) . Since g is a reality, g ( u ∨ v ) = g ( u ) ∨ g ( v ) . It follows g ( v ) = g ( u ) ∨ g ( v ) , hence g ( u ) ≤ g ( v ) , i.e. r | = C R ,g g ( u ) ! g ( v ) . Example . Continuing previous examples, we have r | = C R U DI ! U DU . With thereality g h A ,h B ,h C given in Figure 1.(d1), we have g ( U DI ) = A and g ( U DU ) = AC . Thuswe obtain r | = C ( R ) ,g hA,hB,hC A ! AC Possible and certain functional dependencies
Whenever a reality is provided, we have a semantic associated with any abstract functionaldependencies. This allows to precisely define the intended meaning of a given FD expressedover attributes sets. To do so, we rephrase definition 3.8 with attribute sets instead ofabstract tuples.
Definition 4.1.
Let r be a relation over C R , g a reality, X ⊆ R and A ∈ R . X ! A issatisfied in r with respect to C R and g , denoted r | = C R ,g X ! A , iff for all t, t (cid:48) ∈ r , if forall B ∈ X, g | B ( f B ( t [ B ] , t (cid:48) [ B ])) = 1 then g | A ( f A ( t [ A ] , t (cid:48) [ A ])) = 1 Example . Continuing previous examples, according to the specified interpretationsgiven in Figure 1.(d1), it is easy to verify that g h A ,h B ,h C is a reality. Moreover, r | = C ( R ) ,g hA,hB,hC A ! C In this section, we introduce the notion of possible/certain functional dependencies, asfunctional dependencies satisfied in the relation via realities. We show that checking if agiven functional dependency is possible is an NP-complete problem, whereas checking ifit is certain can be done in polynomial time.
Definition 4.3.
Let r be a relation over C R . The set of possible functional dependenciessatisfied in r with respect to C R , denoted by P oss Σ ( r, C R ) , is defined by P oss Σ ( r, C R ) = { X ! Y | there exists a reality g such that r | = C R ,g X ! Y } Definition 4.4.
Let r be a relation over C R . The set of certain functional dependenciessatisfied in r with respect to C R , denoted by Cert Σ ( r, C R ) , is defined by Cert Σ ( r, C R ) = { X ! Y | r | = C R ,g X ! Y for every reality g } In the rest of the paper, we consider the following problems
PFD and
CFD that askif a given functional dependency X ! A is a possible/certain functional dependency.Possible Functional Dependency ( PFD ) Input:
A relation r over C R ; X ⊆ R , A ∈ R . Question: Is X ! A ∈ P oss Σ ( r, C R ) ?Certain Functional Dependency ( CFD ) Input:
A relation r over C R ; X ⊆ R , A ∈ R . Question: Is X ! A ∈ Cert Σ ( r, C R ) ?The following proposition shows that the number of realities can be exponential. Proposition 4.5.
Let R be a relation schema, C R a schema context. Then the number ofrealities can be exponential in the number of attributes in R .Proof. Consider the schema context C R = { ( f A , L A ) , A ∈ R } where L A = {⊥ ≤ unknown ≤(cid:62)} for every A ∈ R . By definition, for each attribute A ∈ R , every reality g has to sat-isfy g | A ( ⊥ ) = 0 , g | A ( (cid:62) ) = 1 and g | A ( unknown ) = 0 or . So if | R | = n , we have n realities. The main result of the paper is that PFD is NP-complete as shown in the next theorem.
Theorem 4.6.
PFD is NP-complete. roof. A certificate is a reality g : L R ! { , } n of polynomial size. Moreover, thesatisfaction of a functional dependency given the reality can be checked in polynomialtime. It follows PFD belongs to NP.To show NP-hardness, we reduce 3SAT to
PFD . Let ϕ = { C , ..., C m } be a 3CNFon n variables V = { x , ..., x n } . We construct a schema context C R , a relation r ϕ and afunctional dependency X ! A such that ϕ is satisfiable iff there exists a reality g suchthat r ϕ | = C R ,g X ! A .We first build a relation schema R and then a schema context C R as follows:For each variable x i , i = 1 ..n , we create an attribute A i . Then we create an extra attribute A n +1 that corresponds to the attribute A . Thus we obtain the relation schema R = { A , ..., A n , A n +1 } . The domain of each attribute is the natural numbers. We then definethe schema context C ( R ) as follows: • For each attribute A i ∈ R, i = 1 ..n , its abstract domain is D A ( A i ) = {(cid:62) , a i , ¯ a i , ⊥} and for A n +1 , D A ( A n +1 ) = {(cid:62) , ⊥} . • For each attribute A i ∈ R, i = 1 ..n , its comparability function is defined by: f A i ( x, y ) = (cid:62) if x = ya i if | x − y | = 1¯ a i if | x − y | = 2 ⊥ otherwiseand for A n +1 : f A n +1 ( x, y ) = (cid:40) (cid:62) if | x − y | (cid:54) = 1 ⊥ if | x − y | = 1 • For each attribute A i ∈ R, i = 1 ..n , its distributive lattice is defined by L A i = {⊥ ≤ a i ≤ (cid:62) ; ⊥ ≤ ¯ a i ≤ (cid:62)} , and for A n +1 , L A n +1 = {⊥ ≤ (cid:62)} .Then C R = { ( f A i , L A i ) , i = 1 ..n + 1 } is the obtained schema context.Now, we construct a relation instance r ϕ over C R .For each clause C j ∈ ϕ, j = 1 ..m , let C = C j . We associate a subrelation r C with twotuples t and t (cid:48) such that: • For each attribute A i , i ∈ ..n : – if x i ∈ C , t [ A i ] = 3 j − and t (cid:48) [ A i ] = 3 j – if ¯ x i ∈ C , t [ A i ] = 3 j − and t (cid:48) [ A i ] = 3 j − – if x i (cid:54)∈ C and ¯ x i (cid:54)∈ C , t [ A i ] = t (cid:48) [ A i ] = 3 j • For the attribute A n +1 : t [ A n +1 ] = 3 j − and t (cid:48) [ A n +1 ] = 3 j The obtained relation is r ϕ = (cid:83) C ∈ ϕ r C . We take the functional dependency X ! A with X = { A , ..., A n } and A = A n +1 . Clearly, the size of the reduction is polynomial inthe size of the 3CNF ϕ .Note that an example of the construction of r ϕ over C R from a 3CNF formula ϕ isgiven right after the proof (see Example 4.7).Given a reality g , r ϕ | = C R ,g X ! A iff r C | = C R ,g X ! A , for every subrelation r C ⊆ r ϕ , C ∈ ϕ . That is, for any two tuples t and t (cid:48) of different subrelations, we have | t [ A n +1 ] − t (cid:48) [ A n +1 ] | ≥ , and f A n +1 ( t [ A n +1 ] , t (cid:48) [ A n +1 ]) = (cid:62) and thus g | An +1 ( f A n +1 ( t [ A n +1 ] , t (cid:48) [ A n +1 ])) = for every reality g . Thus, for every reality, whenever two tuples t and t (cid:48) disagree on theright-hand side of X ! A , it implies r C = { t, t (cid:48) } for some C ∈ ϕ , i.e. | t [ A n +1 ] − t (cid:48) [ A n +1 ] | =1 and f A n +1 ( t [ A n +1 ] , t (cid:48) [ A n +1 ]) = ⊥ .Now, we show that ϕ is satisfiable iff the functional dependency X ! A is possible.Suppose ϕ is satisfiable and µ : V ! { , } a truth assignment of ϕ . We construct a reality g such that r ϕ | = C R ,g X ! A .First for every attribute context ( f A i , L A i ) , i = 1 ..n , we define the following attributecontext interpretation h i : • h i ( ⊥ ) = 0 and h i ( (cid:62) ) = 1 • h i ( a i ) = µ ( x i ) and h i ( ¯ a i ) = 1 − µ ( x i ) .Then for A n +1 , we define h n +1 ( ⊥ ) = 0 and h n +1 ( (cid:62) ) = 1 .Again, Example 4.7 gives an illustration of the obtained interpretations.Next, we define the schema context interpretation g : L R ! { , } n by g ( (cid:104) x , . . . , x n +1 (cid:105) ) = (cid:104) h ( x ) , . . . , h n +1 ( x n +1 ) (cid:105) . g is an homomorphism since each h i , i = 1 ..n is an homomor-phism. That is h i ( a i ) (cid:54) = h i ( ¯ a i ) , i = 1 ..n . Thus g is a reality.Now, let t, t (cid:48) ∈ r ϕ . Then we have two cases:1. if t ∈ r C and t (cid:48) ∈ r C (cid:48) with C (cid:54) = C (cid:48) , then | t [ A n +1 ] , t (cid:48) [ A n +1 ] (cid:54) = 1 and by construction of f A n +1 , we have f A n +1 ( t [ A n +1 ] , t (cid:48) [ A n +1 ]) = (cid:62) , and then g | An +1 ( f A n +1 ( t [ A n +1 ] , t (cid:48) [ A n +1 ])) =1 . Thus { t, t (cid:48) } | = C R ,g X ! A .2. if t, t (cid:48) ∈ r C j for some C j = ( l ∨ l ∨ l ) ∈ ϕ . Then at least one literal in C j is true bythe assignment µ . Without loss of generalities, suppose µ ( l ) = 1 and l ∈ { x i , ¯ x i } for some A i ∈ R . We have two cases:(a) l = x i : Then t [ A i ] = 3 j − and t (cid:48) [ A i ] = 3 j and f A i ( t [ A i ] , t (cid:48) [ A i ]) = ¯ a i . Thus g | Ai ( ¯ a i ) = 1 − µ ( x i ) = 0 since µ ( x i ) = 1 . It follows { t, t (cid:48) } (cid:54)| = C R ,g X ! A since A i ∈ X and g | Ai ( f A i ( t [ A i ] , t (cid:48) [ A i ])) = 0 .(b) l = ¯ x i : Then t [ A i ] = 3 j − and t (cid:48) [ A i ] = 3 j − and f A i ( t [ A i ] , t (cid:48) [ A i ]) = a i .Thus g | Ai ( a i ) = µ ( x i ) = 0 since µ ( ¯ x i ) = 1 . It follows { t, t (cid:48) } (cid:54)| = C R ,g X ! A withthe same reasoning.Thus the tuples for every subrelation r C cannot agree on X . We conclude that X ! A is always satisfied in the reality g and thus X ! A is possible.Conversely suppose that r ϕ | = C R ,g X ! A for some reality g . Then we constructa truth assignment µ for ϕ . According to the reduction, if two tuples t, t (cid:48) ∈ r ϕ agreeon the set of attributes X then t and t (cid:48) belong to different subrelations. If not, for any t, t (cid:48) ∈ r C ⊆ r ϕ , we have g | Ai ( f A i ( t [ A i ] , t (cid:48) [ A i ])) = 0 for some A i ∈ X . We distinguish cases:1. g | Ai ( a i ) = g | Ai ( ¯ a i ) = 0 : This is impossible since g | Ai must be an homomorphism, i.e. g | Ai ( a i ∨ ¯ a i ) = g | Ai ( (cid:62) ) = 0 , which is a contradiction with the definition of a reality.2. g | Ai ( a i ) = g | Ai ( ¯ a i ) = 1 : This is impossible since f A i ( t [ A i ] , t (cid:48) [ A i ]) = a i or ¯ a i and g | Ai ( f A i ( t [ A i ] , t (cid:48) [ A i ])) = 1 implies either ( g | Ai ( a i ) = 0 and g | Ai ( ¯ a i ) = 1) or ( g | Ai ( ¯ a i ) =0 and g | Ai ( a i ) = 1 ) which contradicts the hypothesis.12. g | Ai ( a i ) (cid:54) = g | Ai ( ¯ a i ) : Without loss of generalities, suppose that g | Ai ( a i ) = 1 and g | Ai ( ¯ a i ) = 0 . By construction, we have x i ∈ C . Put µ ( x i ) = 1 . Then everyclause containing x i is satisfied. Consider now a clause C (cid:48) not containing x i andlet r C (cid:48) = { t , t } . We have by construction g | Ai ( f A i ( t [ A i ] , t [ A i ])) = 1 . Since X ! A is satisfied and g | An +1 ( f A n +1 ( t [ A n +1 ] , t [ A n +1 ])) = 0 , there exists anotherattribute A k (cid:54) = A i , k = 1 ..n such that g | Ak ( f A k ( t [ A k ] , t [ A k ])) = 0 . We use thesame construction for the variable x k as we have done for the variable x i . The sameconstruction is repeated until all clauses are satisfied. Note also that the assignment µ verify µ ( x i ) (cid:54) = µ ( ¯ x i ) . We conclude that ϕ is satisfiable.The following example illustrates the construction used in the proof. Example . Let us consider the following 3CNF formula: ϕ = { ( x ∨ x ∨ ¯ x ) (cid:124) (cid:123)(cid:122) (cid:125) C , ( ¯ x ∨ x ∨ ¯ x ) (cid:124) (cid:123)(cid:122) (cid:125) C , ( ¯ x ∨ x ∨ x ) (cid:124) (cid:123)(cid:122) (cid:125) C } .The relation r ϕ obtained by the previous construction is given in Figure 2. r ϕ A A A A A r C r C r C µ be a truth assignment for ϕ , with µ ( x ) = µ ( x ) = 0 and µ ( x ) = µ ( x ) = 1 .Figure 2 shows the reality corresponding to the truth assignment µ for ϕ .Now we identify tractable cases for a class of lattices and show that PFD is PTIME.For a lattice L , an element a is called a coatom if a is covered by the top of the lattice L and a is an atom if it covers the bot of the lattice. For example a chain is a lattice havinga unique coatom. We show that if for every attribute A ∈ R , L A has a unique atom andcoatom then PFD is PTIME. It encompasses the case for which attributes lattices arechains, and thus fuzzy logic whenever the truth values are totally ordered [JCE17].For lattices with unique coatom and unique atom, a reality g is said to be almostalways false (resp. almost always true) wrt. an attribute A , if for any a ∈ L A , a (cid:54) = (cid:62) wehave g | A ( a ) = 0 and g | A ( (cid:62) ) = 1 (resp. a (cid:54) = ⊥ we have g | A ( a ) = 1 and g | A ( ⊥ ) = 0 ). Proposition 4.8.
Let r be a relation C R , X ⊆ R and A ∈ R . If for every attribute B ∈ R , L A has a unique coatom and atom, then there exists a polynomial time algorithm to checkif X ! A is possible in r , i.e. PFD is PTIME.Proof.
Let g be a reality such that for every attribute B ∈ R \ { A } , g | B is almost alwaysfalse and for A ∈ R , g | A is almost always true. For attributes lattices with unique atomand coatom, the reality g is the “best possible reality” to make X ! A satisfied, sincealmost every comparison is false for the left-hand side X and true for the right hand side13 . We show that X ! A is possible iff r | = C R ,g X ! A . Clearly, if r | = C R ,g X ! A then X ! A is possible by definition. Now suppose that X ! A is possible. Thenthere exists a reality g (cid:48) such that r | = C R ,g (cid:48) X ! A . By contradiction, let us suppose that r (cid:54)| = C R ,g X ! A . Then exist t, t (cid:48) ∈ r such that g | B ( f B ( t [ B ] , t (cid:48) [ B ]) = 1 for every B ∈ X and g | A ( f A ( t [ A ] , t (cid:48) [ A ])) = 0 . By definition of the reality g , it follows f B ( t [ B ] , t (cid:48) [ B ]) = (cid:62) for every B ∈ X and f A ( t [ A ] , t (cid:48) [ A ]) = ⊥ . Thus g (cid:48)| B ( f B ( t [ B ] , t (cid:48) [ B ]) = 1 for every B ∈ X and g (cid:48)| A ( f A ( t [ A ] , t (cid:48) [ A ])) = 0 . This contradicts X ! A is possible. Thus to know whetheror not X ! A is possible, it is sufficient to consider the reality g described above and tocheck if r | = C R ,g X ! A , which can be done in polynomial time.Note also that the proof of Proposition 4.8 still applies whenever the number of latticeswith more than one coatom and one atom is bounded (not detailed here). Now we show that problem
CFD can be solved in polynomial time. First we give acharacterization of certain functional dependencies.
Proposition 4.9.
Let r be a relation over C R , X ⊆ R and A ∈ R . Then X ! A iscertain in r iff for all t, t (cid:48) ∈ r , f B ( t [ B ] , t (cid:48) [ B ]) = ⊥ for some B ∈ X or f A ( t [ A ] , t (cid:48) [ A ]) = (cid:62) .Proof. ( ⇐ ) Let r be a relation such that for all t, t (cid:48) ∈ r f B ( t [ B ] , t (cid:48) [ B ]) = ⊥ for some B ∈ X or f A ( t [ A ] , t (cid:48) [ A ]) = (cid:62) . We distinguish two cases: • f B ( t [ B ] , t (cid:48) [ B ]) = ⊥ for some B ∈ X . Then for any reality g , we have g | B ( f B ( t [ B ] , t (cid:48) [ B ])) = g | B ( ⊥ ) = 0 by definition of a reality. Thus t and t (cid:48) disagree on X , and r | = C R ,g X ! A . • f A ( t [ A ] , t (cid:48) [ A ]) = (cid:62) . Then for every reality g , we have g | A ( f A ( t [ A ] , t (cid:48) [ A ])) = g | A ( (cid:62) ) =1 by definition of a reality. Thus t and t (cid:48) agree on A , and r | = C R ,g X ! A .It follows X ! A is certain in r . ( ⇒ ) Now suppose that there exist t, t (cid:48) ∈ r such f B ( t [ B ] , t (cid:48) [ B ]) (cid:54) = ⊥ for all B ∈ X and f A ( t [ A ] , t (cid:48) [ A ]) (cid:54) = (cid:62) . We show that r (cid:54)| = C R ,g X ! A for the following reality g :For each A (cid:48) ∈ R , we distinguish 2 cases: • Assume that A (cid:48) (cid:54) = A and a an atom such that a ≤ f A (cid:48) ( t [ A (cid:48) ] , t (cid:48) [ A (cid:48) ]) . Then for all x ∈ L A (cid:48) we put g | A (cid:48) ( x ) = 1 if x ≥ a and otherwise. • Assume now that A (cid:48) = A and a a coatom such that f A ( t [ A ] , t (cid:48) [ A ]) ≤ a . Then for all x ∈ L A we put g | A ( x ) = 0 if x ≤ a and otherwise.Clearly g is a reality since attribute lattices are disjoints and distributive, i.e. g | A (cid:48) and g | A defined previously are homomorphisms. Moreover r (cid:54)| = C R ,g X ! A , since g | B ( f B ( t [ B ] , t (cid:48) [ B ])) =1 for all B ∈ X and g | A ( f A ( t [ A ] , t (cid:48) [ A ])) = 0 . Thus X ! A is not certain.From the previous proposition, we point out that CFD is polynomial. Theorem 4.10.
CFD is PTIME. roof. Consider a relation r over C R , X ⊆ R and A ∈ R . According to Proposition4.9, checking if a functional dependency is not certain is equivalent to finding two tuples t, t (cid:48) ∈ r such that f B ( t [ B ] , t (cid:48) [ B ]) (cid:54) = ⊥ for every B ∈ X and f A ( t [ A ] , t (cid:48) [ A ]) (cid:54) = (cid:62) . So thethe number of tests to be performed is bounded by | r | and each verification takes O ( | R | ) ,which is polynomial in the size of the input, whenever each comparability function isconstant time. Data dependencies have been heavily studied over the last years, leading to a plethoraof propositions from seminal functional dependencies to more elaborated forms of depen-dencies, among which we quote [Gog67, DLM92, Ng01, BKL13, CDP16, BCKN18, LP19].Many papers have studied lattice representations of functional dependencies, for instance[DLM92]. However, they do not consider incomplete information as we do in this paper.W. Ng [Ng01] defined ordered domains in the relational data model, i.e. a partial orderover every attribute domain is permitted. The paper studies the consequences on bothfunctional dependencies and relational languages (algebra and SQL).His work does not consider incomplete information as we do: our partial order is notdefined on attribute domain, but on the abstract domain of attributes, and is requiredto form a lattice. It offers a new point of view on functional dependencies in presence ofincomplete information.In [BCKN18], order dependencies are based on a transitive relation, and approximatedependencies on a symmetric relation, leading to approximate-matching dependencies.In [BKL13], the authors study matching dependencies using boolean similarity functions(reflexive and symmetric) and matching functions (idempotent, commutative, and asso-ciative and thus a semilattice). The use of matching functions is used to chase the relationinstance to obtain a clean relation. The way we deal with incomplete information iscompletely different though.[Lib16] studies the semantics of SQL query answering in presence of incomplete infor-mation. They defines a multi-valued logic similar to our contribution, but they do notconsider data dependencies.A hierarchy of data dependencies is proposed in [SGS15] with respect to the complexityof the inference problem. Many of them allow different similarity relations for the sameattribute, which is not possible with the propositions made in this paper.
In presence of incomplete information, a basic task is to decide whether or not two valuesare close enough to be considered as equal. In this paper, we introduced a lattice-basedformalism to deal with incomplete information, leading to new abstract domains for at-tributes.Based on this, a product lattice can be built over a relation schema, allowing toreason on tuples and relations. We then showed that functional dependencies can begeneralized to abstract functional dependencies , while keeping the same Armstrong-likeaxiomatization for reasoning. The axiomatization of abstract functional dependenciesexploits many valued lattice instead of attributes lattice used for functional dependencies.We then introduced interpretations of those abstract functional dependencies usingthe notion of realities, i.e. the interpretations preserving the lattice structure. Since15n exponential number of such realities exists, it opens the possibility to define two keynotions: possible and certain functional dependencies.Several associated problems were studied: whereas the problem of deciding whetheror not a given FD is certain is PTIME, we have pointed out that for possible FD, theproblem is NP-Complete. We also identified tractable cases.This paper opens new directions to deal with incomplete information in different appli-cation areas. For examples, other types of possible and certain data dependencies could bestudied, such as inclusion dependencies or multivalued dependencies. The approximationof a certain dependency with respect to the number of realities satisfying the dependencycould be also investigated. Certain query answering could be also investigated, for instanceSQL operations like select and join could be revisited in presence of realities, leading tomany different notions of abstract equality and certain answers. Another question is tocharacterize which class of attributes lattices and realities have the same properties asdistributive lattices and homomorphisms considered in this paper.
Acknowledgements:
The authors acknowledge the support received from the AgenceNationale de la Recherche of the French government throught the program "Investisse-ments d’Avenir" IDEX-ISITE initiative CAP 20-25 (ANR-16-IDEX-0001), the CNRSMastodons projects (QualiSky 2016-2018) and the Datavalor initiative (INSA).The authors also acknowledge Simon Vilmin to point out an error in the previousversion of this paper.
References [AHV95] Serge Abiteboul, Richard Hull, and Victor Vianu.
Foundations of Databases .Addison-Wesley, 1995.[BB79] Catriel Beeri and Philip A. Bernstein. Computational problems related tothe design of normal form relational schemas.
ACM Trans. Database Syst. ,4(1):30–59, March 1979.[BCKN18] Jaume Baixeries, Victor Codocedo, Mehdi Kaytoue, and Amedeo Napoli.Characterizing approximate-matching dependencies in formal concept anal-ysis with pattern structures.
Discrete Applied Mathematics , 249:18 – 27, 2018.Concept Lattices and Applications: Recent Advances and New Opportunities.[BDFS84] Catriel Beeri, Martin Dowd, Ronald Fagin, and Richard Statman. On thestructure of Armstrong relations for functional dependencies.
Journal of theACM , 31(1):30–46, 1984.[Ber11] Leopoldo E. Bertossi.
Database Repairing and Consistent Query Answer-ing . Synthesis Lectures on Data Management. Morgan & Claypool Publishers,2011.[BKL13] Leopoldo E. Bertossi, Solmaz Kolahi, and Laks V. S. Lakshmanan. Data clean-ing and query answering with matching dependencies and matching functions.
Theory Comput. Syst. , 52(3):441–482, 2013.[CDP16] Loredana Caruccio, Vincenzo Deufemia, and Giuseppe Polese. Relaxed func-tional dependencies - A survey of approaches.
IEEE Trans. Knowl. Data Eng. ,28(1):147–165, 2016. 16DLM92] János Demetrovics, Leonid Libkin, and Ilya B. Muchnik. Functional depen-dencies in relational databases: A lattice point of view.
Discrete Applied Math-ematics , 40(2):155–185, 1992.[DP02] Brian A Davey and Hilary A Priestley.
Introduction to lattices and order .Cambridge university press, 2002.[Gog67] J.A Goguen. L-fuzzy sets.
Journal of Mathematical Analysis and Applications ,18(1):145 – 174, 1967.[GPW14] Sergio Greco, Fabian Pijcke, and Jef Wijsen. Certain query answering in par-tially consistent databases.
PVLDB , 7(5):353–364, 2014.[Grä11] George Grätzer.
Lattice theory: foundation . Springer Science & BusinessMedia, 2011.[IL84] Tomasz Imielinski and Witold Lipski. Incomplete information in relationaldatabases.
J. ACM , 31(4), September 1984.[JCE17] L. Jezková, Pablo Cordero, and Manuel Enciso. Fuzzy functional dependencies:A comparative survey.
Fuzzy Sets and Systems , 317:88–120, 2017.[Len02] Maurizio Lenzerini. Data integration: A theoretical perspective. In
Proceedingsof the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Princi-ples of Database Systems, June 3-5, Madison, Wisconsin, USA , pages 233–246,2002.[Lib16] Leonid Libkin. SQL’s three-valued logic and certain answers.
ACM Trans.Database Syst. , 41(1):1:1–1:28, 2016.[LP19] Sebastian Link and Henri Prade. Relational database schema design for un-certain data.
Inf. Syst. , 84:88–110, 2019.[Mar80] George Markowsky. The representation of posets and lattices by sets. algebrauniversalis , 11(1):173–192, Dec 1980.[Ng01] Wilfred Ng. An extension of the relational data model to incorporate ordereddomains.
ACM Trans. Database Syst. , 26(3):344–383, 2001.[SGS15] Jaroslaw Szlichta, Lukasz Golab, and Divesh Srivastava. On axiomatizationand inference complexity over a hierarchy of functional dependencies. In
Pro-ceedings of the 9th Alberto Mendelzon International Workshop on Foundationsof Data Management, Lima, Peru, May 6 - 8, 2015. , 2015.[SORK11] Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch.
Probabilis-tic Databases . Synthesis Lectures on Data Management. Morgan & ClaypoolPublishers, 2011. 17
Annexe 1: Running example
From the relation r of the introduction, the set of abstract tuples is given below.A B C t , t G T C t , t G T U t , t U D C t , t U T C t , t B T I t , t G T U t , t U D U t , t U T U t , t B T U t , t G T C t , t GB D C t , t B D I t , t G T C t , t B T I t , t G T C
B Annexe 2: Proofs of section 2.3
For the sake of completeness, we provide in this appendix all proofs of theresults omitted in the paper.
Theorem B.1.
Algorithm 1 computes the closure of a tuple u ∈ D A ( R ) in O ( k.n. | Σ | ) time complexity, where k is the complexity of the join operation in lattice L A for eachattribute A ∈ R .Proof. The total complexity of Algorithm 1, depends on the cost for checking conditionsof the while loop and the cost of the join operation. Using the set encoding of lattices L A for each attribute A ∈ R , the cost of testing w ≤ u Σ and u Σ ∨ z can be implemented in O ( k.n ) . We use the same techniques as in [BB79] to select the implication w ! z ∈ Σ that satisfies the while’s condition. Proposition B.2. Σ | = C R s ! u iff u ≤ s +Σ Proof.
Suppose that Σ | = C R u ! v . First we show that for all t, t (cid:48) ∈ r , u ≤ f R ( t, t (cid:48) ) implies u Σ ≤ f R ( t, t (cid:48) ) , where r is an instance relation. We use the induction on thenumber of steps of the closure algorithm 1. Let u i be the closure of u at step i and atstep i + 1 we choose w ! z ∈ Σ with w ≤ u i . By the induction hypothesis, we have u i ≤ f R ( t, t (cid:48) ) and then w ≤ u i ≤ f R ( t, t (cid:48) ) and since Σ | = C R w ! z we have z ≤ f R ( t, t (cid:48) ) .Thus u i ∨ z = u i +1 ≤ f R ( t, t (cid:48) ) . So we conclude that u Σ ≤ f R ( t, t (cid:48) ) .Now, let r = { u Σ } be an instance. Then r | = C R Σ by the construction of the closure u Σ . Since Σ | = C R u ! v , we have u ≤ u Σ = f R ( t, t (cid:48) ) and then v ≤ u Σ .Conversely, suppose v ≤ u Σ , and let t, t (cid:48) ∈ r , where r is a relation instance such that u ≤ f R ( t, t (cid:48) ) . Using the induction of the number of steps of Algorithm 1, we show that Σ | = C R u ! u Σ , i.e. u Σ ≤ f R ( t, t (cid:48) ) , ans since v ≤ u Σ , we deduce that v ≤ f R ( t, t (cid:48) ) . Byinduction hypothesis, suppose that Σ | = C R u ! u i , where u i is the value of u Σ at step i .At step i + 1 , we have u i (cid:54) = u Σ otherwise the proof is finished, and we choose an abstractdependency w ! z ∈ Σ such that w ≤ u i . 18hen by the hypothesis, u i ≤ f R ( t, t (cid:48) ) and thus w ≤ f R ( t, t (cid:48) ) . Since w ! z ∈ Σ , wehave z ≤ f R ( t, t (cid:48) ) , and thus u i +1 ≤ f R ( t, t (cid:48) ) . Finally we deduce that Σ | = C R u ! u Σ . Theorem B.3. F | = C R u ! v iff F (cid:96) C R u ! v We show both the soundness and the completeness of the extended Armstrong axioms.
Lemma B.4. (Soundness) Σ (cid:96) C R u ! v ⇒ Σ | = C R u ! v Proof.
Let r be a relation such that r | = C R Σ and u, v ∈ D A ( R ) . We show that the threeextended Armstrong axioms are sound.(reflexivity) Let v ≤ u . Then Σ | = C R u ! v .Let t, t (cid:48) ∈ r such that u ≤ f R ( t, t (cid:48) ) . It follows v ≤ u ≤ f R ( t, t (cid:48) ) . Then r | = C R u ! v and Σ | = C R u ! v .(augmentation) if Σ | = C R u ! v and w ∈ D A ( R ) , then Σ | = C R u ∨ w ! v ∨ w Let t, t (cid:48) ∈ r such that u ≤ f R ( t, t (cid:48) ) . Since Σ | = C R u ! v , v ≤ f R ( t, t (cid:48) ) .Assume that u ∨ w ≤ f R ( t, t (cid:48) ) . Then u ≤ f R ( t, t (cid:48) ) and w ≤ f R ( t, t (cid:48) ) (lattice properties).It follows v ≤ f R ( t, t (cid:48) ) and w ≤ f R ( t, t (cid:48) ) , hence v ∨ w ≤ f R ( t, t (cid:48) ) . then r | = C R u ∨ w ! v ∨ w and the result follows.(transitivity) if Σ | = C R u ! v and Σ | = C R v ! w then Σ | = C R u ! w Let t, t (cid:48) ∈ r such that u ≤ f R ( t, t (cid:48) ) . Since Σ | = C R u ! v, v ≤ f R ( t, t (cid:48) ) . Since Σ | = C R v ! w, w ≤ f R ( t, t (cid:48) ) . The result follows. Lemma B.5. (Completness) Σ | = C R u ! v ⇒ Σ (cid:96) C R u ! v Proof.
Suppose that F | = C R u ! v . We first show by induction on the number of iterationsof Algorithm 1, that F (cid:96) C R u ! u + . Let u i be the value of u + at step i .For i = 0 , we have u = u and by ( I ) F (cid:96) C R u ! u .Suppose that F (cid:96) C R u ! u i . At step i + 1 , we choose an abstract dependency w ! z ∈ F , such that w ≤ s i . Then1. w ! z u i ! s i ∨ z = u i +1 , by ( I ) .3. u ! u i +1 , by ( I ) Thus F (cid:96) C R u ! u + . By proposition B.2, we have v ≤ u + , and thus F (cid:96) C R u + ! v by ( I ) . Finally using ( I ) we obtain F (cid:96) C R u ! vv