[PDF] On the Interaction of Functional and Inclusion Dependencies with Independence Atoms

Abstract

Infamously, the finite and unrestricted implication problems for the classes of i) functional and inclusion dependencies together, and ii) embedded multivalued dependencies alone are each undecidable. Famously, the restriction of i) to functional and unary inclusion dependencies in combination with the restriction of ii) to multivalued dependencies yield implication problems that are still different in the finite and unrestricted case, but each are finitely axiomatizable and decidable in low-degree polynomial time. An important embedded tractable fragment of embedded multivalued dependencies are independence atoms that stipulate independence between two attribute sets. We establish a series of results for implication problems over subclasses of the combined class of functional and inclusion dependencies as well as independence atoms. One of our main results is that both finite and unrestricted implication problems for the combined class of independence atoms, unary functional and unary inclusion dependencies are axiomatizable and decidable in low-degree polynomial time.

Full PDF

aa r X i v : . [ c s . D B ] J a n On the Interaction of Functional and InclusionDependencies with Independence Atoms

Miika Hannula a , Juha Kontinen a , Sebastian Link b a Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland b School of Computer Science, University of Auckland, New Zealand

Abstract

Infamously, the ﬁnite and unrestricted implication problems for the classesof i) functional and inclusion dependencies together, and ii) embedded mul-tivalued dependencies alone are each undecidable. Famously, the restrictionof i) to functional and unary inclusion dependencies in combination with therestriction of ii) to multivalued dependencies yield implication problems thatare still diﬀerent in the ﬁnite and unrestricted case, but each are ﬁnitelyaxiomatizable and decidable in low-degree polynomial time. An importantembedded tractable fragment of embedded multivalued dependencies are in-dependence atoms that stipulate independence between two attribute sets.We establish a series of results for implication problems over subclasses ofthe combined class of functional and inclusion dependencies as well as inde-pendence atoms. One of our main results is that both ﬁnite and unrestrictedimplication problems for the combined class of independence atoms, unaryfunctional and unary inclusion dependencies are axiomatizable and decidablein low-degree polynomial time.

1. Introduction

Databases represent information about some domain of the real world.For this purpose, data dependencies provide the main mechanism for enforc-

Email addresses: [email protected] (Miika Hannula), [email protected] (Juha Kontinen), [email protected] (SebastianLink)Some of our results were presented at the 23rd International Conference on DatabaseSystems for Advanced Applications (DASFAA 2018) and the 21st International Conferenceon Logic for Programming, Artiﬁcial Intelligence and Reasoning (LPAR 2017) ng the semantics of the given application domain within a database system.As such, data dependencies are essential for most data management tasks,including conceptual, logical and physical database design, query and updateprocessing, transaction management, as well as data cleaning, exchange, andintegration. The usability of a class C of data dependencies for these tasksdepends critically on the computational properties of its associated impli-cation problem. The implication problem for C is to decide whether for agiven ﬁnite set Σ ∪ { ϕ } of data dependencies from C , Σ implies ϕ , that is,whether every database that satisﬁes all the elements of Σ also satisﬁes ϕ .If we require databases to be ﬁnite, then we speak of the ﬁnite implicationproblem, and otherwise of the unrestricted implication problem. While theimportance of data dependencies continues to hold for new data models, thefocus of this article is on the ﬁnite and unrestricted implication problems forimportant classes of data dependencies in the relational model of data. Inthis context, data dependency theory is deep and rich, and dedicated booksexist [27, 56].Functional and inclusion dependencies constitute the most commonlyused classes of data dependencies in practice. In particular, functional de-pendencies (FDs) are more expressive than keys, and inclusion dependencies(INDs) are more expressive than foreign keys, thereby capturing Codd’s prin-ciples of entity and referential integrity, respectively, on the logical level. AnFD R : X → Y with attribute subsets X, Y on relation schema R expressesthat the values on attributes in Y are uniquely determined by the values onattributes in X . In particular, R : X → R expresses that X is a key for R .For example, on schema Patient = { p id , p name } the FD Patient : p id → p name expresses that the id of a patient uniquely determines the name ofthe patient, and on schema Test = { t id , t desc } the FD Test : t id → t desc expresses that the id of a medical test uniquely determines the descriptionof a test. An inclusion dependency (IND) R [ A , . . . , A n ] ⊆ R ′ [ B , . . . , B n ],with attribute sequences A , . . . , A n on R and B , . . . , B n on R ′ , expressesthat for each tuple t over R there is some tuple t ′ over R ′ such that forall i = 1 , . . . , n , t ( A i ) = t ′ ( B i ) holds. If n = 1 we call the IND unary(UIND). For example, on schema Heart = { p id , p name , t id } the unary IND Heart [ t id ] ⊆ Test [ t id ] expresses that each id of a test that is performedon patients to diagnose a heart disorder must reference the (unique) id of amedical test on schema Test .A fundamental result in dependency theory is that the unrestricted andﬁnite implication problems for the combined class of FDs and INDs diﬀer and2ach is undecidable [14, 48, 49]. Interestingly, for the expressive subclass ofFDs and UINDs, the unrestricted and ﬁnite implication problems still diﬀerbut each are axiomatizable and decidable in low-degree polynomial time [16].Another important expressive class of data dependencies are embeddedmultivalued dependencies (EMVDs). An EMVD R : X → Y ⊥ Z with at-tribute subsets X, Y, Z of R expresses that the projection r [ XY Z ] of a re-lation r over R on the set union XY Z is the join r [ XY ] ⊲⊳ r [ XZ ] of itsprojections on XY and XZ . Another fundamental result in dependencytheory is that the unrestricted and ﬁnite implication problems for EMVDsdiﬀer, each is not ﬁnitely axiomatizable [55] and each is undecidable [34, 35].An important fragment of EMVDs are multivalued dependencies (MVDs),which are a class of full dependencies in which XY Z covers the full under-lying set R of attributes. In fact, MVDs are the basis for Fagin’s fourthnormal form [19, 57]. For the combined class of FDs, MVDs, and UINDs,ﬁnite implication is axiomatizable and decidable in cubic time, while unre-stricted implication is also axiomatizable and decidable in almost linear time[16, 38].Another expressive known fragment of EMVDs that is computationallyfriendly is the class of independence atoms (IAs). IAs are EMVDs R : X → Y ⊥ Z where X = ∅ , i.e. expressing that r [ Y Z ] = r [ Y ] ⊲⊳ r [ Z ] holds. IAsare denoted by Y ⊥ Z . In our example, the IA p id ⊥ t id on schema Heart expresses that all patients that are tested for a heart disorder undergo alltests for this disorder. For the class of IAs, the ﬁnite and unrestricted im-plication problems coincide, they are ﬁnitely axiomatizable and decidablein low-degree polynomial time [40]. Besides their attractive computationalfeatures, IAs are interesting for a variety of other reasons: (i) Database re-searchers studied them as early as 1976 [10], with continued interest over theyears [17, 33, 40, 52]. (ii) Geiger, Paz, and Pearl studied IAs in a probabilis-tic setting [23] where they constitute an important fragment of conditionalindependencies, which form the foundation for Markov and Bayesian net-works. (iii) IAs occur naturally in database practice. For example, the crossproduct between various tables is computed by the

FROM clause in SQL. Nat-urally, a variety of IAs hold on the resulting table. The choice of the bestquery plan depends typically on the correct estimation of cardinalities forintermediate results. For eﬃciency purposes, the estimates typically assumeindependence between attribute columns [54]. Knowing which independen-cies actually hold, could replace cardinality estimation by exact cardinalitiesand therefore result in better query plans. We acknowledge that the inde-3endence statements required for these types of optimizations are mostlyrestricted to hold for speciﬁc combinations of values on the given attributes.However, to understand such expressive independence atoms, we ﬁrst need tounderstand the more basic ones. This is therefore an exciting area of futureresearch that will be inﬂuenced by the results we derive in the current article.More recently, Olteanu and Zavodny [50] studied succinct representations ofrelational data by employing algebraic factorizations using distributivity ofCartesian products over unions. Not surprisingly, one of the core enablingnotions of the factorizations is that of independence. (iv) In fact, the con-cept of independence is fundamental to areas as diverse as causality, boundvariables in logic, random variables in statistics, patterns in data, the theoryof social choice, Mendelian genetics, and even some quantum physics [3, 4].In a recent response, the study of logics with IAs as atoms of the languagehas been initiated [26].Given the usefulness of EMVDs, FDs, and INDs for data management,given their computational barriers, and given the attractiveness of IAs as atractable fragment of EMVDs, it is a natural question to ask how IAs, FDs,and INDs interact. We aim at helping address this current gap in the existingrich theory of relational data dependencies. Adding further to the challengeit is important to note that IAs still form an embedded fragment of EMVDs,in contrast to MVDs which are a class of full dependencies. Somewhat sur-prisingly, already the interaction of IAs with just keys is intricate [31, 33].For example, unrestricted implication is ﬁnitely axiomatizable but ﬁnite im-plication is not for keys and unary IAs (those with singleton attribute sets),while the ﬁnite and unrestricted implication problems coincide and enjoy aﬁnite axiomatization for IAs and unary keys (those with a singleton attributeset).In this article we make the following contributions.1. For the combined class of IAs and INDs, ﬁnite and unrestricted implica-tion coincide and we establish a ﬁnite axiomatization. We further showthat implication for this class is PSPACE-complete and ﬁxed-parametertractable in the maximum arity of the given dependencies. As theseresults already hold for INDs [12, 13], adding IAs to INDs adds signif-icant expressivity without penalties in terms of computational proper-ties. This is in sharp contrast to adding IAs to keys [31, 33].2. For the combined class of FDs and IAs, ﬁnite and unrestricted impli-cation diﬀer [31, 33]. We show that ﬁnite implication is not ﬁnitely4 able 1: Subclasses of FD+IND+IA. We write “ui” and “ﬁ” for unrestrited and ﬁniteimplication, respectively. class ui = ﬁ complexity: ui / ﬁ ﬁnite axiomatization: ui / ﬁFD yes [5] linear time [6] yes (2-ary) [5]IND yes [13]

PSPACE -complete [13] yes (2-ary) [13]IA yes [23, 40, 52] cubic time [23, 40] yes (2-ary) [23, 40, 52]IND+IA yes

PSPACE -complete yes (3-ary)FD+IA, FD+UIA no [33] ? / ? ? / noFD+IND no [14, 48] undecidable / undecidable [14, 48] no / no [14, 48]FD+UIND no [16] cubic time / cubic time [16] yes / no (inﬁnite) [16]UFD+UIND no [16] linear time / linear time [16] yes / no (inﬁnite) [16]

UFD+UIND+IA no cubic time / cubic time yes / no (inﬁnite) axiomatizable, already for binary FDs (those with a two-element at-tribute set on the left-hand side) and unary IAs. For the combinedclass of IAs and unary FDs, we show that ﬁnite and unrestricted im-plication coincide and establish a ﬁnite axiomatization. Hence, thesituation for the combined class of FDs and IAs is more intricate thanfor the combined class of FDs and MVDs, where ﬁnite and unrestrictedimplication coincide, which enjoy an elegant ﬁnite axiomatization [7],and for which implication can be decided in almost linear time [22].3. For the combined class of IAs, unary FDs, and UINDs, we prove exis-tence of ﬁnite Armstrong relations, establish axiomatizations for theirﬁnite and unrestricted implication problems, and show that both aredecidable in low-degree polynomial time. This is analogous to the re-sults for the combined class of FDs, MVDs, and UINDs. To the bestof our knowledge, the class of IAs, unary FDs, and UINDs is only thesecond known class for which the ﬁnite and unrestricted implicationdiﬀer but both are decidable in low-degree polynomial time. The classis practically relevant as it covers arbitrary independence atoms on topof unary keys and unary foreign keys, and already unary keys and unaryforeign keys occur readily in practice [16]. The signiﬁcant diﬀerence toFDs, MVDs, and UINDs is the more intricate interaction between FDsand IAs in comparison to FDs and MVDs. Since unary FDs and INDsfrequently occur in database practice in the form of surrogate keys andforeign keys that reference them, the ability to reason eﬃciently aboutIAs, UFDs, and UINDs is good news for data management. Finally,5rading in restrictions of the arity on INDs and FDs for restrictions onthe arity of IAs cannot be successful: Finite implication for unary IAsand binary FDs is not ﬁnitely axiomatizable, see 2).4. For the combined class of IAs and INDs, and the combined class of IAsand FDs, we establish tractable conditions suﬃcient for non-interaction,in both the ﬁnite and unrestricted cases. For the class of IAs and INDs,the condition ensures that we can apply known algorithms for decidingimplication of the individual classes of IAs and INDs, respectively, todecide implication for an input that combines both individual classes.For the general class of IAs and FDs the decidability of ﬁnite and unre-stricted implication are both still open. Instances of the ﬁnite or unre-stricted implication problems that meet the non-interaction conditionscan therefore be decided eﬃciently by using already known algorithmsfor the sole class of IAs and the sole class of FDs.

Organization.

We illustrate some use cases for our work in Section 2.In Section 3 we present all the necessary deﬁnitions for the article. Section 4examines the axiomatic characterization of the combined class of INDs andIAs, and Section 5 addresses the combined class of FDs and IAs. In Section 6we focus on the combination of UFDs, UINDs, and IAs, and establish axiom-atizations for their ﬁnite and unrestricted implication problems. Section 7identiﬁes polynomial-time criteria for the non-interaction between INDs andIAs, and also between FDs and IAs. Finally, in Section 8 we discuss the com-putational complexity of the implication problems considered. We concludein Section 9 where we also list some direction for future work.

2. Motivating Showcases

We use a few showcases to illustrate how knowledge about independenceatoms can advance data management. FDs and INDs do not require furthermotivation but the more we know about the interaction of IAs with FDs andINDs, the more we can advance data management.

We use a simpliﬁed example to illustrate how IAs, FDs and INDs can beused to manage data integrity in databases. For this purpose, consider thefour relation schemata • Patient = { p id , p name } , 6 Test = { t id , t desc } , • Heart = { p id , p name , t id } , and • Disorder = { p id , t id , conﬁrmed } ,in which basic information about patients and medical tests is stored. Inparticular, Heart stores which medical tests for a speciﬁc heart disorder wereperformed on which patients, and

Disorder stores all those tests performedon patients which have been diagnosed with the disorder. In addition, thefollowing set Σ of FDs, INDs, and IAs has been speciﬁed: • σ = Patient : p id → p name , • σ = Test : t id → t desc , • σ = Heart [ p id , p name ] ⊆ Patient [ p id , p name ], • σ = Heart [ t id ] ⊆ Test [ t id ], • σ = Heart : p id ⊥ t id , • σ = Disorder [ p id ] ⊆ Heart [ p id ], • σ = Disorder [ t id ] ⊆ Heart [ t id ], • σ = Disorder : conﬁrmed ⊥ conﬁrmed .Note that not all constraints need to be enforced strictly. For example,violations of σ may issue alerts about patients that still have to undergoremaining tests. The IA conﬁrmed ⊥ conﬁrmed expresses that all tuples havethe same value on attribute conﬁrmed . There are a number of interestingdependencies that are implied by Σ. Firstly, the IND σ together with theFD σ ﬁnitely imply the FD σ = Heart : p id → p name . In turn,the FD σ and the IA σ together ﬁnitely imply the IA σ = Heart : p id,p name ⊥ t id and thus also σ = Heart : p name ⊥ t id . Finally, theINDs σ and σ and the IA σ together ﬁnitely imply the IND σ = Disorder [ p id,t id ] ⊆ Heart [ p id,t id ] . In particular, the last interaction is very relevant in practice. While the twoUINDs σ and σ do not together imply the IND σ , knowing that the IA σ holds on the referenced schema, tells us that σ also holds on the referencingschema. It may be more natural to specify σ in the ﬁrst place, instead ofspecifying σ and σ , but enforcing these two unary INDs and the IA σ ismore eﬃcient than enforcing the binary IND σ and the IA σ [47].7 .2. Query optimization As another example for the usefulness of understanding the interactionbetween IAs, FDs, and INDs, we consider query optimization. The famousdivision operator π XY ( R ) ÷ π Y ( R ) returns all those X -values x such thatfor every Y -value y there is some tuple t with t ( X ) = x and t ( Y ) = y [15].The ability of the division operator to express universal quantiﬁcation makesit very powerful. The validity of independence is intrinsically linked to theoptimization of the division operator, as our following result suggests. Theorem 1.

For all relations r over R , π XY ( R )( r ) ÷ π Y ( R )( r ) = π X ( R )( r ) if and only if r satisﬁes X ⊥ Y .Proof. The division operator is deﬁned as follows: π XY ( R )( r ) ÷ π Y ( R )( r ) = π X ( R )( r ) − π X (( π X ( R )( r ) × π Y ( R )( r )) − π XY ( R )( r )) , and r satisﬁes X ⊥ Y if and only if π X ( R )( r ) × π Y ( R )( r ) = π XY ( R )( r ). Theresult follows directly.In particular, the validity of an IA reduces the quadratic complexity ofthe division operator to a linear complexity of a simple projection [43], asillustrated next on our running example. Suppose, we would like to returnthe p id of people that have undergone all tests listed for the speciﬁc heartdisorder we consider. We can express this query by a division operator asfollows: π p id,t id ( Heart ) ÷ π t id ( Heart ). In SQL, the query would have touse double-negation as in:SELECT H0.p id FROM

Heart

H0WHERE NOT EXISTSSELECT ∗ FROM

Heart

H1WHERE NOT EXISTSSELECT ∗ FROM

Heart

H2WHERE H2.t id = H1.t id ANDH2.p id = H0.p id ;However, if a query optimizer can notice that the IA σ is implied by theenforced set Σ given above, then the query can be rewritten intoSELECT p idFROM Heart ;8hile the set of our constraints is weakly acyclic [21], our query is not “path-conjunctive” and the chase & backchase algorithm from [18] cannot be ap-plied.

Our second example is database security. More speciﬁcally, the aim ofinference control is to protect private data under inferences that clever attacksmay use to circumvent access limitations [9]. For example, the combinationof a particular patient name (say Jack) together with a particular medicalexamination (say angiogram) may be considered a secret, while access to thepatient name and access to the medical examination in isolation may not be asecret. However, in some given context such as a procedure to diagnose somecondition, all patients may need to undergo all examinations. That is, theinformation about the patient is independent of the information about theexamination. Now, if the secret (Jack, angiogram) must not be revealed toan unauthorized user that can query the data source, then this user must notlearn both: that Jack is a patient undergoing the diagnosis of the condition,and that angiogram is a medical examination that is part of the processfor diagnosing the condition. Being able to understand the interaction ofindependence atoms with other database constraints can therefore help us toprotect secrets under clever inference attacks.

Our ﬁnal example is data proﬁling. Here we would like to demonstratethat independence atoms do occur in real-world data sets. For that purpose,we have mined some well-known publicly available data sets that have beenused for the mining of other classes of data dependencies before [51]. Wereport the basic characteristics of these data sets in the form of their numbersof rows and columns, and list the number of maximal IAs and the maximumarity of those found. Here, an IA X ⊥ Y is maximal in a given set of IAs ifthere is no other IA V ⊥ W in the set such that V ⊆ X and W ⊆ Y holds.The arity of an IA is deﬁned as the total number of attribute occurrences. Data set Number of columns Number of rows Number of IAs Maximum arity bridges 13 108 4 3echocardiogram 13 132 5 4adult 14 48,842 9 3hepatitis 20 155 855 6horse 27 368 112 3

9t should be stressed that the usefulness of these IAs is not restricted tothose that are semantically meaningful. For example, the optimizations forthe division operator also apply to IAs that “accidentally” hold on a givendata set. For the future, we envision that data proﬁling tools can also keepproﬁles of sophisticated notions of independence atoms. For example, in thedata set hepatitis one desires non-bias and therefore an independence atom age ⊥ sex to hold. Indeed, if this independence held we would know that thenumber of distinct tuples in the projection of hepatitis onto age and sexwould be the product of the distinct tuples in the projection onto age and inthe projections onto sex, however we have | hepatitis [ age , sex ] | = 0 . × | hepatitis [ age ] | × | hepatitis [ sex ] | . This is valuable information that could be proﬁled.

3. Preliminaries

We usually write

A, B, C, ... for attributes,

X, Y, Z, ... for either sets orsequences of attributes, depending on the context. For two sets (sequences) X and Y , we write XY for their union (concatenation). Similarly, we maywrite A instead of the single element set or sequence that consists of A . Thesize of a set (or length of a sequence) X is denoted by | X | .A relation schema is a set of attributes A , each with a domain Dom( A ),and by a database schema we denote a pairwise disjoint sequence of relationschemata. A tuple over a relation schema R is a function that maps each A ∈ R to Dom( A ). A relation r over R is a non-empty set of tuples over R . To emphasize that r is a relation over R , we sometimes write r [ R ].A database over a database schema R , . . . , R n is a sequence of relations( r [ R ] , . . . , r n [ R n ]). A ﬁnite relation over R is a non-empty, ﬁnite set oftuples over R , and a ﬁnite database is a sequence of ﬁnite relations. For atuple t and a relation r over R and X ⊆ R , t ( X ) is the restriction of t to X ,and r ( X ) is the set of all restrictions t ( X ) where t ∈ r . If X = ( A , . . . , A n )is a sequence of attributes, then we write t ( X ) for ( t ( A ) , . . . , t ( A n )).We exclude empty relations from our deﬁnition. This is a practical as-sumption with no eﬀect when single relation schemata are considered only.However, on multiple relations it has an eﬀect, e.g., the rule U I d =( r [ R ] , . . . , r n [ R n ]) be a database. For two sequences of distinct attributes10 , . . . , A n ∈ R i and B , . . . , B n ∈ R j , R i [ A . . . A n ] ⊆ R j [ B . . . B n ] isan inclusion dependency with semantics deﬁned by d | = R i [ A . . . A n ] ⊆ R j [ B . . . B n ] if for all t ∈ r i there is some t ′ ∈ r j such that t ( A ) = t ′ ( B ) , . . . , t ( A n ) = t ′ ( B n ). For two (not necessarily disjoint) sets of at-tributes X, Y ⊆ R i , R i : X ⊥ Y is an independence atom with semantics: d | = R i : X ⊥ Y if for all t, t ′ ∈ r i there exists t ′′ ∈ r i such that t ′′ ( X ) = t ( X )and t ′′ ( Y ) = t ′ ( Y ). For two sets of attributes X, Y ⊆ R i , R i : X → Y is a functional dependency with semantics: d | = R i : X → Y if for all t, t ′ ∈ r i , t ( X ) = t ′ ( X ) implies t ( Y ) = t ′ ( Y ). We may exclude relation schematafrom the notation if they are clear from the context (e.g. write X ⊥ Y in-stead of R i : X ⊥ Y ). A disjoint independence atom (DIA) is an IA X ⊥ Y where X ∩ Y is empty. We say that an IND is k -ary if it is of the form A . . . A k ⊆ B . . . B k . An IA X ⊥ Y and an FD X → Y are called k -aryif max {| X | , | Y |} = k . A class of dependencies is called k -ary if it containsat most k -ary dependencies. We add “U” to a class name to denote itsunary subclass, e.g., UIND denotes the class of all unary INDs. Similarly,for k ≥ k ” to a class name to denote its k -ary subclass. We use“+” to denote unions of classes, e.g., IND+IA denotes the class of all inclu-sion dependencies and independence atoms. Note that the semantics of IAsimplies:* d | = R i : X ⊥ X , iffor all s, s ′ ∈ r i it holds that s ( X ) = s ′ ( X ).Hence, unary FDs of the form ∅ → A and unary IAs of the form A ⊥ A arealso called constancy atoms (CAs).The restriction of a dependency σ to a set of attributes R , written σ ↾ R ,is X ∩ R → Y ∩ R for an FD σ of the form X → Y , and X ∩ R ⊥ Y ∩ R for anIA σ of the form X ⊥ Y . If σ is an IND of the form A . . . A n ⊆ B . . . B n and i , . . . , i k lists { i = 1 , . . . , n : A i ∈ R and B i ∈ R } , then σ ↾ R = A i . . . A i k ⊆ B i . . . B i k . For a set of dependencies Σ, the restriction of Σ to R , writtenΣ ↾ R , is the set of all σ ↾ R where σ ∈ Σ. Let A and B be attributes from R . By σ ( R : A B ) we denote dependencies obtained from σ by replacingany number of occurrences of A with B .A set R of rules of the form σ , . . . , σ n ⇒ σ is called an axiomatization .A rule of the previous form is called n -ary, and an axiomatization consistingof at most n -ary rules is called n -ary. A deduction from a set of dependenciesΣ by an axiomatization R is a sequence of dependencies ( σ , . . . , σ n ) whereeach σ i is either an element of Σ or follows from σ , . . . , σ i − by an application11f a rule in R . In such an occasion we write Σ ⊢ R σ , or simply Σ ⊢ σ if R isknown.Given a ﬁnite set of database dependencies Σ ∪ { σ } , the (ﬁnite) unre-stricted implication problem is to decide whether all (ﬁnite) databases thatsatisfy Σ also satisfy σ , written Σ | = σ (Σ | = ﬁn σ ). An axiomatization R is sound for the unrestricted implication problem of a class of dependencies C if for all ﬁnite sets Σ ∪ { σ } of dependencies from C , Σ ⊢ R σ ⇒ Σ | = σ ;it is complete if Σ | = σ ⇒ Σ ⊢ R σ . Soundness and completeness for ﬁniteimplication are deﬁned analogously.We assume that all our axiomatizations are attribute-bounded. A soundand complete axiomatization is said to be attribute-bounded if it does notintroduce new attributes, i.e., any implication of σ by Σ can be veriﬁedby a deduction in which only attributes from Σ or σ appear [14]. It iseasy to see that a ﬁnite (attribute-bounded) axiomatization gives rise toa decision procedure for the associated implication problem. The converseis not necessarily true; join dependencies consitute a class that is associatedwith a decidable implication problem, yet they lack ﬁnite axiomatization [53].Consider then the class FD+IND+IA. Clearly, both sets { (Σ , σ ) | Σ | = σ } and { (Σ , σ ) | Σ = ﬁn σ } are recursively enumerable; the ﬁrst via reduction tothe validity problem of ﬁrst-order logic, and the second by checking throughwhether some ﬁnite relation satisﬁes Σ ∪{¬ σ } . Consequently, given a subclass C of FD+IND+IA, the unrestricted and ﬁnite implication problems for C aredecidable whenever these two problems coincide.Many of our completeness proofs utilize the chase technique (see, e.g., [2]).The chase provides a general tool for reasoning about various dependenciesas well as for optimizing conjunctive queries. Given an implication problemfor σ by Σ, the starting point of the chase is a simple database falsifying σ .For instance, the chase for independence atoms starts with a unirelationaldatabase consisting of two rows that disagree on all attributes. Using somededicated set of chase rules, this initial database is then completed to anotherdatabase satisfying Σ. If the new database satisﬁes also σ , then one concludesthat the implication holds. For some classes, such as embedded multivalueddependencies, the chase does not necessarily terminate. In those cases onlya semi-decision procedure is obtained. Axiomatizations.

Tables 2, 3, and 4 present the axiomatizations con-sidered in this article. In Table 2, the axiomatization I := {I , . . . , I } is sound and complete for independence atoms alone [33, 40]. The rules F , F , F F I

F I

U I

Theorem 2.

The axiomatization A ∪ B ∪ C is sound for the unrestricted andﬁnite implication problems of FD+IND+IA. ∅⊥ X X ⊥ YY ⊥ X (trivial independence, I

1) (symmetry, I X ⊥ Y ZX ⊥ Y X ⊥ Y XY ⊥ ZX ⊥ Y Z (decomposition, I

3) (exchange, I X ⊥ Y Z ⊥ ZX ⊥ Y Z XY → Y (weak composition, I

5) (reﬂexivity, F X → Y Y → ZX → Z X → YXZ → Y Z (transitivity, F

2) (augmentation, F X ⊥ Y X → Y ∅ → Y X ⊥ Y Z Z → VX ⊥ Y ZV (constancy,

F I

1) (composition,

F I Table 2: Axiomatization A for FDs and IAs . We deﬁne I := {I , . . . , I } and A ∗ := A \ {I , F } . . Independence Atoms and Inclusion Dependencies In this section we establish a set of inference rules that is proven soundand complete for the unrestricted and ﬁnite implication problems of inde-pendence atoms and inclusion dependencies. This axiomatization consists ofrules C describing interaction between the two classes (see Table 4) and twosets I := {I , . . . , I } and B of complete rules for both classes in isolation(see Tables 2 and 3, resp.). Furthermore, as a consequence of the complete-ness proof we obtain that the ﬁnite and unrestricted implication problemscoincide for IND+IA. In addition, our completeness proof enables us to con-struct Armstrong databases for this class of constraints, and to simplify theimplication problem for a subclass of IND+IA. R [ X ] ⊆ R [ X ](reﬂexivity, U R [ X ] ⊆ R ′ [ Y ] R ′ [ Y ] ⊆ R ′′ [ Z ] R [ X ] ⊆ R ′′ [ Z ](transitivity, U R [ A . . . A n ] ⊆ R ′ [ B . . . B n ] R [ A i . . . A i m ] ⊆ R ′ [ B i . . . B i m ]( ∗ )(projection and permutation, U ∗ ) i j are pairwise distinct and from { , . . . , n } Table 3: Axiomatization B for INDs We start with the following simplifying lemma which reduces one ﬁniteIND+IA-implication problem to another that is not associated with any con-stancy atoms, i.e., IAs of the form X ⊥ X . Note that we write Σ ⊣⊢ Σ ′ ifΣ ⊢ Σ ′ and Σ ′ ⊢ Σ. Lemma 3.

Let Σ be a set of IAs and INDs over schema R , . . . , R n , and let C := S ni =1 { A ∈ R i | Σ ⊢ R i : A ⊥ A } . Let Σ and σ be the restrictions of Σ and σ to the attributes not in C , and let Σ be obtained from Σ by [ X ] ⊆ R ′ [ Z ] R [ Y ] ⊆ R ′ [ W ] R ′ [ Z ⊥ W ] R [ XY ] ⊆ R ′ [ ZW ](concatenation, U I R [ XY ] ⊆ R ′ [ ZW ] R ′ [ ZW ] ⊆ R [ XY ] R ′ [ Z ⊥ W ] R [ X ⊥ Y ](transfer, U I R [ X ] ⊆ R ′ [ Y ] R ′ : Y ⊥ YR ′ [ Y ] ⊆ R [ X ](symmetry, U I R [ X ] ⊆ R ′ [ Y ] R ′ : Y ⊥ YR : X ⊥ X (constancy, U I R [ A ] ⊆ R ′ [ C ] R [ B ] ⊆ R ′ [ C ] R ′ : C ⊥ C σσ ( R : A B )(equality, U I Table 4: Axiomatization C for IAs and INDs (1) replacing R i : X ⊥ Y ∈ Σ with R i : X \ C ⊥ Y \ C , (2) adding R i : A . . . A j ⊥ A j +1 . . . A m ( R i \ C ) , where A , . . . , A m is someenumeration of R i ∩ C and j = 1 , . . . , m , (3) adding R j [ B ] ⊆ R i [ A ] if Σ ⊢ R i [ A ] ⊆ R j [ B ] ∧ R j : B ⊥ B .Then Σ ⊣⊢ Σ ∪ Σ ∪ { R i : R i ∩ C ⊥ R i ∩ C | i = 1 , . . . , n } and (i) σ is an IA: Σ | = ﬁn σ ⇒ Σ | = ﬁn σ , (ii) σ is an IND: Σ | = ﬁn σ ⇒ Σ | = ﬁn σ .Proof. Clearly we have that Σ ⊣⊢ Σ ∪ Σ ∪{ R i : R i ∩C ⊥ R i ∩C | i = 1 , . . . , n } .For claim (i) note that any ﬁnite database d = ( r , . . . , r n ) satisfying Σ ∪{¬ σ } can be extended to a one satisfying Σ ∪ {¬ σ } by replacing in each r i ∈ d each tuple t with all tuples t ′ such that t ′ ( A ) = 0 for A ∈ R i ∩ C and t ′ ( A ) ∈ { , t ( A ) } for A ∈ R i \ C , where 0 is a value not appearing d .15ext we show claim (ii). Assuming a ﬁnite database d ′ = ( r ′ [ R ] , . . . , r ′ n [ R n ])satisfying Σ ∪ {¬ σ } for σ of the form R l [ X ] ⊆ R l ′ [ Y ], we construct a ﬁnitedatabase d = ( r [ R ] , . . . , r n [ R n ]) satisfying Σ ∪ {¬ σ } . Let t ∈ r ′ l be suchthat t ( X ) = t ′ ( Y ) for all t ′ ∈ r ′ l ′ . Let t be an extension of t ( R l ∩ C ) to C suchthat, for A ∈ R i ∩ C and B ∈ R l ∩ C , t ( A ) = t ( B ) if Σ ⊢ R i [ A ] ⊆ R l [ B ], andotherwise t ( A ) is any value from r ′ i ( A ). Note that we may assume withoutlosing generality that t is well-deﬁned, i.e., for no distinct B, B ′ ∈ R l ∩ C ,Σ ⊢ R l [ B ] ⊆ R l [ B ′ ]. For this, deﬁne an equivalence class ∼ on R l ∩ C suchthat B ∼ B ′ if Σ ⊢ R l [ B ] ⊆ R l [ B ′ ]. Using U I | = ﬁn σ ⇒ Σ ∗ | = ﬁn σ ∗ and Σ ∗ ⊢ σ ∗ ⇒ Σ ⊢ σ where Σ ∗ ∪ { σ ∗ } isthe set of constraints obtained from Σ ∪ { σ } by replacing attributes in R l ∩ C with their equivalence classes.Now, deﬁne r i := r ′ i ( R i \ C ) × { t ( R i ∩ C ) } , for i = 1 , . . . , n . Since t ∈ r l and r l ′ ⊆ r ′ l ′ by items (2,3) and the construction, we obtain that d ′ = R l [ X ] ⊆ R l ′ [ Y ]. It also easy to see by the construction that all IAs in Σremain true in d . Assume then that R i [ X . . . X m ] ⊆ R j [ Y . . . Y m ] ∈ Σ, andlet t ∈ r i . Since Y i ∈ C implies X i ∈ C and t ( X i ) = t ( Y i ), we can assumethat Y , . . . , Y m

6∈ C . Hence, r j ( Y . . . Y m ) = r ′ j ( Y . . . Y m ). Again, r i ⊆ r ′ i by(2,3) and the construction, and d ′ | = R i [ X . . . X m ] ⊆ R j [ Y . . . Y m ]; hence weobtain that d | = R i [ X . . . X m ] ⊆ R j [ Y . . . Y m ]. This concludes case (ii) andthe proof.The following lemma will be also helpful in the sequel. Lemma 4. XY ⊥ U V can be deduced from XU ⊥ Y V , X ⊥ U , and Y ⊥ V byrules I , I , I .Proof. The following deduction shows the claim: Y ⊥ V XU ⊥ Y VY V ⊥ XU I Y ⊥ XU V I Y ⊥ U V I U V ⊥ Y I X ⊥ U XU ⊥ Y VX ⊥ Y U V I Y U V ⊥ X I U V ⊥ XY I XY ⊥ U V I Theorem 5.

The axiomatization I ∪ B ∪ C is sound and complete for theunrestricted and ﬁnite implication problems of IA+IND.Proof. By Theorem 2 the axiomatization is sound. For completeness withrespect to both implication problems, it suﬃces to show that ﬁnite implica-tion entails derivability. For this, notice that unrestricted implication entailsﬁnite implication. Hence, assume that Σ | = FIN σ for a ﬁnite set Σ ∪ { σ } ofIAs and INDs over database schema R , . . . , R n . Let C := S ni =1 { A ∈ R i | Σ ⊢ R i : A ⊥ A } . By I − I σ is either a CA, a DIA, or an IND. Next we show that Σ ⊢ σ in these threecases. σ is a constancy atom. Assume that σ is of the form R l : A ⊥ A ,and assume to the contrary that Σ σ . First let I be the set of attributes B for which there is i = 1 , . . . , n such that Σ ⊢ R l [ A ] ⊆ R i [ B ]. Then let d = ( r , . . . , r n ) be the database where r i := R i ∩I { , } × R i \I { } . We showthat d | = Σ ∪ {¬ σ } which contradicts the assumption that Σ | = FIN σ . It iseasy to see that d satisﬁes ¬ σ . Furthermore, d satisﬁes any R i [ A . . . A m ] ⊆ R j [ B . . . B m ] from Σ because A i ∈ I ⇒ B i ∈ I by U R i : XZ ⊥ Y Z ∈ Σ where X and Y are disjoint. Bythe construction, d | = R i : X ⊥ Y , so it suﬃces to show that d | = R i : B ⊥ B for B ∈ Z . If d = R i : B ⊥ B , then by the construction Σ ⊢ R l [ A ] ⊆ R i [ B ].Moreover by U I

3, Σ ⊢ R i [ B ] ⊆ R l [ A ], and by U I

4, Σ ⊢ R l : A ⊥ A , contraryto the assumption. Hence d | = R i : B ⊥ B which concludes the proof of d | = Σ ∪ {¬ σ } and the case of σ being a constancy atom. σ is a disjoint independence atom. Assume that σ is a DIA of theform R l : A . . . A h ⊥ A h +1 , . . . , A h + k . By Lemma 3 we may assume that Σis a set of DIAs and INDs. Deﬁne ﬁrst a database d = ( r [ R ] , . . . , r n [ R n ])such that • r l = { s, s ′ } where s and s ′ map all attributes in R l to 0 except that s ( A i ) = i for i = 1 , . . . , h and s ′ ( A i ) = i for i = h + 1 , . . . , h + k ; • r i = { u } where u maps all attributes in R i to 0, for i = l .17he idea is to extend d to a database d = ( r , . . . , r n ) such that d | = Σ and* if t ∈ r i is such that t ( B ) = i , . . . , t ( B m ) = i m and 0 < i < . . . < i m ,then Σ ⊢ R l [ A i . . . A i m ] ⊆ R i [ B . . . B m ].We let d be the result of chasing d by Σ over the following two chase rules,i.e., d is obtained by applying rules (i-ii) to d repeatedly until this is nomore possible. (i) Assume that R [ X ] ⊆ R ′ [ Y ] ∈ Σ and t ∈ r [ R ] is such that for no t ′ ∈ r ′ [ R ′ ], t ( X ) = t ′ ( Y ). Then extend r ′ with t new that maps Y pointwise to t ( X ) and otherwise maps attributes in R ′ to 0. (ii) Assume that R : X ⊥ Y ∈ Σ and t, t ′ ∈ r [ R ] are such that for no t ′′ ∈ r [ R ], t ′′ ( X ) = t ( X ) and t ′′ ( Y ) = t ′ ( Y ). Then extend r with t new that agrees with t on X , with t ′ on Y , and maps every other attributein R to 0.Note that since the range of the assigned values is ﬁnite, the process ter-minates. Hence, d is a ﬁnite model of Σ, and therefore by the assumptionit satisﬁes σ . It is also straightforward to verify, using U , U U , U I , I , U I ∗ is satisﬁed.Since d satisﬁes σ , we ﬁnd a tuple from r l mapping A i to i for i = 1 , . . . , n .It thus suﬃces to show that, given any sequence ~d = ( d , . . . , d m ), where d i +1 is obtained from d i by applying (i) or (ii), and any S ⊆ { , . . . , h + k } , ifthere is t ∈ r l for d m = ( r , . . . , r n ) such that t ( A i ) = i for i ∈ S , thenΣ ⊢ R l : S ∩ A . . . A h ⊥ S ∩ A h +1 . . . A h + k . We show this claim by inductionon the number of applications of (ii) in ~d .Assume ﬁrst that no application of (ii) occurs. Then t cannot combineany values from both s and s ′ and hence S cannot intersect both { , . . . , h } and { h + 1 , . . . , h + k } . Consequently, R l : S ∩ A . . . A h ⊥ S ∩ A h +1 . . . A h + k is derivable by I I j applications of (ii). We prove the claim for S = { , . . . , h + k } (the gen-eral case S ⊆ { , . . . , h + k } is analogous) and ~d = ( d , . . . , d m ) in whichthe number of applications of (ii) is j + 1. Let d k be the database ob-tained by applying (ii) for the last time, say with regards to some R : X ⊥ Y and tuples t, t ′ , t new . Without loss of generality we may assume that the se-quence ( d k +1 , . . . , d m ) is obtained by a chain of applications of (i) copying1 , . . . , h + k from t new ( B ) , . . . , t new ( B h + k ) to t ( A ) , . . . , t ( A h + k ), for some B , . . . , B h + k ∈ XY and t from database d m . Otherwise, step k can be omit-18ed and the claim follows by induction assumption. Now, using repeatedly U , U ⊢ R i [ B . . . B h + k ] ⊆ R l [ A . . . A h + k ] . (1)Moreover, since ∗ is satisﬁed with regards to d m we haveΣ ⊢ R l [ A . . . A h + k ] ⊆ R i [ B . . . B h + k ] . (2)Let us then deﬁne another sequence of databases ~d ′ = ( d , . . . , d k − , d ′ k +1 , . . . , d ′ m )in which t new does not appear and ( d ′ k +1 , . . . , d ′ m ) is obtained by a chain ofapplications of (i) copying t ( B ) , . . . , t ( B h + k ) to t ( A ) , . . . , t ( A h + k ) for some B , . . . , B h + k ∈ XY and t from database d ′ m . Let B i . . . B i a B i a +1 . . . B i b B i b +1 . . . B i c B i c +1 . . . B i d relist B . . . B h + k so that • B i , . . . , , B i b ∈ X and B i b +1 , . . . , B i d ∈ Y • { i , . . . , i a , i b +1 , . . . , i c } = { , . . . , h } and { i a +1 , . . . , i b , i c +1 , . . . , i d } = { h + 1 , . . . , h + k } .Since Σ ⊢ R : B i . . . B i b ⊥ B i b +1 . . . B i d by I I

3, we obtain using (1),(2), U

3, and

U I ⊢ R l : A i . . . A i b ⊥ A i b +1 . . . A i d . (3)Furthermore, we observe that t ( A i , , . . . , A i b ) = t ( B i , . . . , B i b ) = t new ( B i , . . . , B i b ) = ( i , . . . , i b ) . Since ~d ′ contains only j applications of (ii), it follows by induction assump-tion that Σ ⊢ R l : A i . . . A i a ⊥ A i a +1 . . . A i b . It can be shown by an analogousargument that Σ ⊢ R l : A i b +1 . . . A i c ⊥ A i c +1 . . . A i d . By Lemma 4 these twoand (3) imply that Σ ⊢ R : A . . . A h ⊥ A h +1 . . . A h + k . This concludes theinduction proof and the case of σ being a disjoint IA. σ is an inclusion dependency. Assume that σ is an IND of theform R l [ X ] ⊆ R l ′ [ Y ]. By Lemma 3 we may assume that Σ is a set of IAs19nd disjoint INDs. We let d = ( r , . . . , r n ) where r , . . . , r l − , r l +1 , . . . , r n are single rows of 0’s, and r l = { s } for a tuple s : A i i , where R l = { A , . . . , A m } . It suﬃces then to chase d by Σ with rules (i,ii), and show thatthe resulting database d ′′ satisﬁes *. Since this is analogous to the previouscase, we omit the proof here.We obtain the following corollaries. For the ﬁrst corollary, note that inthe last two cases of the previous proof none of the rules I , U I , U I , U I

Corollary 6.

The axiomatization {I , I , I , I }∪{U I , U I }∪ B is soundand complete for the unrestricted and ﬁnite implication problems of DIA+IND. Corollary 7.

The ﬁnite and unrestricted implication problems of IND+IAcoincide.4.2. Armstrong Databases

Furthermore, it is straightforward to construct an Armstrong databasebased on all counterexample constructions. Given a class C of dependencies,an Armstrong database for a set Σ of C -dependencies is a relation that satisﬁesall C -dependencies implied by Σ and does not satisfy any C -dependency notimplied by Σ [20]. Armstrong databases are perfect sample databases asthey reduce the implication problem for checking whether an arbitrary C -dependency ϕ is implied by a ﬁxed set Σ of C -dependencies to checkingwhether ϕ is satisﬁed in a C -Armstrong database for Σ. This concept is usefulfor sample-based schema designs of databases and helps with acquisitionof data dependencies that are meaningful for a given application domain[42, 46]. We say that some database is an Armstrong database with respectto ﬁnite implication if we replace above ”implied” by ”ﬁnitely implied”. Auni-relational Armstrong database is called an Armstrong relation . Theorem 8.

Let Σ be a ﬁnite set of IA+IND. Then Σ has a ﬁnite Armstrongdatabase.Proof. Without loss of generality we may consider only the uni-relationalcase. We need to construct a ﬁnite relation r which satisﬁes Σ and falsiﬁes any σ not implied by Σ. Let r σ be a ﬁnite relation that satisﬁes Σ and falsiﬁes σ .We deﬁne an Armstrong relation r as the set of tuples t constructed as follows.20irst, from each r σ select some t σ . Then t maps A to ( t σ ( A ) , . . . , t σ n ( A ))where σ , . . . , σ n is some enumeration of all IAs and INDs not implied by Σ.It is straightforward to verify that r satisﬁes Σ and falsiﬁes each σ i .This simple method can be extended to other dependency classes as well.However, in later sections we demonstrate how to combine Armstrong andcounterexample constructions which has the eﬀect of producing smaller Arm-strong relations. Another consequence of Theorem 5 is that the implication problem forUIND+CA by IND+IA can be determined by considering only interactionbetween UINDs and CAs. For a set of dependencies Σ, deﬁne Σ CA := { A ⊥ A | R : AX ⊥ AY ∈ Σ } and Σ UIND := { R [ A i ] ⊆ R ′ [ B i ] | R [ A . . . A n ] ⊆ R ′ [ B . . . B n ] ∈ Σ , i = 1 , . . . , n } . The following theorem now formulates thisidea. Theorem 9.

Let Σ be a set of INDs and IAs, and let σ be a UIND or a CA.The following are equivalent: (1) Σ | = σ , (2) Σ UIND ∪ Σ CA | = σ , (3) σ is derivable from Σ UIND ∪ Σ CA by U , U , U I , U I .Proof. It is clear that (3) ⇒ (2) ⇒ (1). We show that (1) ⇒ (3). ByTheorem 5, there is a deduction ( σ , . . . , σ m ) from Σ by I ∪ B ∪ C such that σ m = σ . It is a straightforward induction to show that for all i = 1 , . . . , m : • If σ i is R : A ⊥ A , then σ i satisﬁes (3). • If σ i is R [ A . . . A n ] ⊆ R ′ [ B . . . B n ], then σ j := R [ A j ] ⊆ R ′ [ B j ] satisﬁes(3), for j = 1 , . . . , n .It is worth noting that every application of U I

5, where σ ( R : A B ) isa UIND or CA, can be simulated by U , U I , U I

4. All the other cases arestraightforward and left to the reader.

5. Independence Atoms and Functional Dependencies

In this section we consider the interaction between functional dependen-cies and independence atoms. Already keys and IAs combined form a some-what intricate class: Their ﬁnite and unrestricted implication problems diﬀer21nd the former lacks a ﬁnite axiomatization [32, 33]. In Section 5.1 we willextend these results to the classes FD+IA and 2FD+UIA. However, the in-teraction between unary FDs and IAs is less involved. In Section 5.2 wewill show that for UFD+IA unrestricted and ﬁnite implication coincide andthe axiomatization A ∗ , depicted in Table 2, forms a sound and completeaxiomatization.For notational clarity we restrict attention to the uni-relational case fromnow on. That is, we consider only those cases where databases consist of asingle relation. The following theorem enables us to separate the ﬁnite and unrestrictedimplication problems for FD+IA as well as for FD+UIA.

Theorem 10 ([31]) . The unrestricted and ﬁnite implication problems forkeys and UIAs diﬀer.

This theorem was proved by showing that Σ | = ﬁn σ and Σ = σ , for Σ := { A ⊥ B, C ⊥ D, BC → AD, AD → BC } and σ := AB → CD . Next we showhow this counterexample generalizes to a non-axiomatizability result for theﬁnite implication problem of FD+IA. For n ≥

2, deﬁne R n := { A i , B i : i =1 , . . . , n } and Σ n := { A i ⊥ B i : i = 1 , . . . , n } ∪ { A S( i ) B i → R n : i = 1 , . . . , n } where S( n ) = 1 and S( i ) = i + 1, for i < n . We say that σ follows fromΣ by k -ary (ﬁnite) implication, written Σ | = k σ (Σ | = k ﬁn σ ), if Σ ′ | = σ forsome Σ ′ ⊆ Σ of size at most k . We say that an inference rule of the form σ , . . . , σ k ⇒ σ is k -ary. An axiomatization is called k -ary if it consists of atmost k -ary rules. In [31] it was shown that, for n ≥ (1) Σ n | = ﬁn A B → R n ; (2) Σ n | = n − X ⊥ Y iﬀ X, Y are disjoint and such that XY = A i B i , X = ∅ ,or Y = ∅ ; (3) given Σ ′ ⊆ Σ n of size 2 n − X ⊆ R such that A S( i ) B i X forall i = 1 , . . . , n , one ﬁnds a relation r satisfying Σ ′ and tuples t, t ′ ∈ r such that t ( A ) = t ′ ( A ) iﬀ A ∈ X .It follows from (3) that (4) Σ n | = n − X → Y iﬀ Y ⊆ X or A S( i ) B i ⊆ X for some i = 1 , . . . , n .Note that all FDs and IAs described in (2) and (4) follow from Σ n by unaryﬁnite implication. Consequently, the set of FDs and IAs described in (2)22nd (4) is closed under (2 n − n − n ≥

2, and sinceall IAs in Σ n are unary, we obtain the following theorem. Theorem 11.

The ﬁnite implication problem for FD+IA (2FD+UIA) is notﬁnitely axiomatizable.

To the best of our knowledge, decidability is open for both FD+IA andFD+UIA with respect to their ﬁnite and unrestricted implication problems.It is worth noting here that the unrestricted (ﬁnite) implication problem forFD+UIA is as hard as that for FD+IA. For this, we demonstrate a simplereduction from the latter to the former. Let Σ ∪ { σ } be a set of FDs and IAs,and let Σ ′ denote the set of FDs and IAs where each IA of the form X ⊥ Y isreplaced with dependencies from { A ⊥ B, X → A, A → X, Y → B, B → Y } where A and B are fresh attributes. If σ is an FD, then Σ (ﬁnitely) implies σ iﬀ Σ ′ (ﬁnitely) implies σ . Also, if σ is of the form X ⊥ Y , then we haveΣ | = σ iﬀ Σ ′′ | = σ ′ , whereΣ ′′ := Σ ′ ∪ { X → A, A → X, Y → B, B → Y } ,σ ′ := A ⊥ B , and A and B are fresh attributes. Next we turn to the class UFD+IA. We show that A ∗ (see Table 2)forms a sound and complete axiomatization for UFD+IA in both the ﬁniteand unrestricted cases. Hence, compared to UIAs and FDs, the interactionbetween IAs and UFDs is relatively tame. Combined, however, these two mayentail new restrictions to column sizes. For instance, in the ﬁnite A → B , A → B , and B ⊥ B imply | r ( B ) | · | r ( B ) | ≤ | r ( A ) | .We also show that the class UFD+IA admits Armstrong relations. Westart with the following auxiliary lemma which shows existence of ﬁnite Arm-strong relations in the absence of CAs. Lemma 12.

Let Σ be a ﬁnite set of UFDs and IAs that is closed under A ∗ and such that it contains no CAs. Then Σ has a ﬁnite Armstrong relation.Proof. For an attribute A deﬁne A + as the set of attributes B such that A → B ∈ Σ. Enumerating the underlying relation schema of Σ as R = { A , . . . , A n } we ﬁrst let r to consist of two constant tuples t and t that23espectively map all attributes to 0 and 1, and tuples t i that map attributesin A + i to 0 and those in R \ A i to i . By transitivity we observe that thecondition (i) is satisﬁed by r . Next we extend r to a relation satisfyingboth conditions. First we chase r by all IAs in Σ with the following rulewhere b is a new symbol. • Assume that X ⊥ Y ∈ Σ and t, t ′ ∈ r are such that for no t ′′ ∈ r , t ′′ ( X ) = t ( X ) and t ′′ ( Y ) = t ′ ( Y ). Then extend r with t new such that t new ( X ) = t ( X ), t new ( Y ) = t ′ ( Y ), and t new ( R \ XY ) = b .Note that X and Y are disjoint for all X ⊥ Y ∈ Σ, and thus the rule is welldeﬁned. Assuming X and Y share an attribute A generates a deductionof ∅ → A using decomposition, reﬂexivity, and constancy. This, in turn,contradicts our assumption.Since each t new assigns attributes only to n + 1 diﬀerent values, the chaseprocedure terminates and generates a unique r satisfying all IAs in Σ. Let r be obtained from r by replacing each t ∈ r with the tuple s t which maps A to the tuple t ( A + ). We claim that r is a ﬁnite Armstrong relation. Firstnote that r is of the size of r and hence ﬁnite. In the following we considerthe cases of UFDs and IAs.Let us show ﬁrst that r satisﬁes an arbitrary UFD A → B from Σ. Let s t , s t ′ be tuples in r that agree on A . Then t and t ′ agree on A + , and thuson B + , from which it follows that s t and s t ′ agree on B .Suppose then A i → B is not in Σ. Then t and t i map A i to 0 but B to0 and i , respectively. By deﬁnition, s t and s t i agree on A i and disagree on B , and thus r does not satisfy A i → B . Furthermore, note that any ∅ → A is falsiﬁed by s t and s t .Let us then show that r satisﬁes an arbitrary X ⊥ Y in Σ. Let s t , s t ′ ∈ r ,and let t ′′ ∈ r be such that it agrees with t on X and with t ′ on Y . We claimthat s t ′′ agrees with s t on X and with s t ′ on Y . Let A ∈ X ; we show that s t ( A ) = s t ′′ ( A ). By symmetry and composition IA components are closerunder UFDs which means that we have A + ⊆ X . Hence, t agrees with t ′′ on A + , and therefore s t agrees with s t ′′ on A . Consequently, s t ( X ) = s t ′′ ( X ),and analogously we obtain that s t ′ ( Y ) = s t ′′ ( Y ). This shows that r satisﬁes X ⊥ Y .To show that r satisﬁes no X ⊥ Y / ∈ Σ, it suﬃces to show that X ⊥ Y ∈ Σ if for some t ∈ r we have t ( X ) = 0 and t ( Y ) = 1. Proving this is astraightforward induction on the chase construction. For t and t it suﬃces24o employ I

1. For the induction step it suﬃces to use Lemma 4 and rules I , I t new , obtained by an application of the chase ruleto X ⊥ Y , maps attributes in R \ XY to 0 or 1. Theorem 13.

The axiomatization A ∗ is sound and complete for the unre-stricted and ﬁnite implication problems of UFD+IA. Furthermore, any ﬁniteset of UFDs and IAs has a ﬁnite Armstrong relation.Proof. By Theorem 2 the axiomatization is sound. To show completenessand existence of Armstrong relations, it suﬃces to show that there is a ﬁniteArmstrong relation for any ﬁnite set Σ of UFDs and IAs that is closed under A ∗ . Let R ′ = R \ C for C := { A ∈ R | ∅ → A ∈ Σ } . First we note that Σ ↾ R ′ is derivable from Σ. For an IA X ⊥ Y ∈ Σ, we may derive X \ C ⊥ Y \ C bysymmetry and decomposition. For a UFD A → B ∈ Σ we consider the onlynon-trivial case: B intersects with R ′ but A does not. In this case ∅ → A ∈ Σand thus we obtain by transitivity that ∅ → B ∈ Σ, which contradicts ourassumption.Let Σ ′ be the closure of Σ ↾ R ′ under A ∗ . Having concluded that Σ ′ isderivable from Σ, we observe that it cannot contain any CAs for otherwisewe would derive from Σ some ∅ → A where A ∈ R ′ , and thus A ∈ C which isa contradiction. Hence, we may apply the previous lemma to obtain a ﬁniterelation r satisfying exactly those UFDs and IAs that belong to Σ ′ . Deﬁne r := C { } × r . We claim r that is an Armstrong relation for Σ.Assume ﬁrst X ⊥ Y ∈ Σ, and consider X \ C ⊥ Y \ C ∈ Σ ′ . Since r satisﬁes X \ C ⊥ Y \ C , it follows that r satisﬁes X ⊥ Y . Assume then that U → B ∈ Σ. If U ∈ C or U is the empty set, then B ∈ C by transitivityfrom which it follows that r satisﬁes U → B trivially. The same happens if B ∈ C . If U is non-empty and U, B C , then U → B is satisﬁed by r andthus by r , too.Suppose then X ⊥ Y / ∈ Σ. By symmetry and composition we note that X \ C ⊥ Y \ C / ∈ Σ ′ . Hence, r does not satisfy X \ C ⊥ Y \ C and thus r does not satisfy X ⊥ Y . Suppose then U → B / ∈ Σ. If U is the empty set,then B is not in C , and thus U → B does not hold in r nor in r . If U is anattribute, then again by reﬂexivity and transitivity B cannot be in C . If U is in C , then U → B is false in r because U is a constant and B is not. If U is not in C , then U → B is false in r and consequently in r as well. Thisconcludes the proof.As the same axiomatization characterizes both ﬁnite and unrestricted25mplication, we obtain the following corollary. Corollary 14.

The ﬁnite and unrestricted implication problems coincide forUFD+IA.

6. Independence atoms, unary functional dependencies and unaryinclusion dependencies

In this section we consider the combined class of FDs, INDs, and IAs.In the previous section we noticed that the ﬁnite implication problem forbinary FDs and unary IAs is not ﬁnitely axiomatizable. On the other hand,both the ﬁnite and unrestricted implication problems for unary FDs andbinary INDs are undecidable [48]. Hence, in this section we will restrictour interest to unary FDs and unary INDs, a class for which ﬁnite andunrestricted implication problems already deviate [16]. It turns out that thecombination of unary functional dependencies, unary inclusion dependencies,and arbitrary independence atoms can be axiomatized with respect to ﬁniteand unrestricted implication. However, in the ﬁnite case the axiomatizationis inﬁnite as one needs to add so-called cycle rules for UFDs and UINDs.

Let us ﬁrst consider unrestricted implication for the class UFD+UIND+IA.Based on Section 5.2 and [16] we show that A ∗ ∪ {U , U , U I , U I } forms asound and complete axiomatization. An axiomatization for the unrestrictedimplication problem of UINDs and embedded implicational dependencies wasshown in [16]. In the uni-relational case, an embedded implicational depen-dency (EID) is a ﬁrst-order sentence ∀ ~x ∃ ~y ( φ ( ~x ) → ψ ( ~x, ~y )) over a relationalvocabulary { R } where • φ ( ~x ) is a non-empty ﬁnite conjunction of relational atoms and the vari-ables occurring in this conjunction are exactly all the variables listedin ~x ; • ψ ( ~x, ~y ) is a non-empty ﬁnite conjunction of relational atoms and equal-ity atoms and the variables occurring in this conjunction are exactlyall the variables listed in ~y and some of the variables listed in ~x ; • each variable in ~x~y associates with a single relation position, and foreach occurrence of an equality atom x = y , x and y associate with thesame relation position. 26ote that EIDs include all FDs and IAs but exclude all non-trivial INDs.The following presentation of Theorem 17 is a reformulation from [16]. Deﬁnition 15 ([16]) . For any set ∆ of EIDs and set Σ of UINDs, we deﬁnethe set Y called the singlevalued span of ∆ and Σ to be the minimum set ofattributes Y that satisﬁes the two conditions: (1) if ∆ ∪ { Y ⊥ Y } | = A ⊥ A , then add A to Y , (2) if attribute B is in Y and A ⊆ B is in Σ , then add A to Y . Deﬁnition 16 ([16]) . For any set A of EIDs, any set Z of UINDs, and Y the singlevalued span of ∆ and Σ , we deﬁne the sets ∆ ′′ and Σ ′′ , called theunrestricted extensions of ∆ and Σ , by ∆ ′′ = ∆ ∪{ Y ⊥ Y } and Σ ′′ = Σ ∪{ A ⊆ B : B ⊆ A in Σ , B in Y } . Theorem 17 ([16]) . Let ∆ be a set of EIDs, Σ set of UINDs, Y the singl-evalued span, and ∆ ′′ , Σ ′′ the unrestricted extensions of ∆ , Σ . For any ED δ and any UIND σ , we have • ∆ ∪ Σ | = σ ⇔ Σ ′′ | = σ , • ∆ ∪ Σ | = δ ⇔ ∆ ′′ | = δ . By Theorems 2, 13, 17, and since U , U Theorem 18.

The axiomatization A ∗ ∪ {U , U , U I , U I } is sound andcomplete for the unrestricted implication problem of UFD+UIND+IA.6.2. Finite Implication For the ﬁnite implication problem of UFD+UIND+IA we obtain a com-plete axiomatization by extending the axioms in Theorem 18 with the so-called cycle rules [16] (see Table 4) and by removing

U I , U I

Deﬁnition 19 ([16]) . For each set Σ of UINDs and UFDs over R , let G (Σ) be the multigraph that consists of nodes R , red directed edges ( A, B ) , for → A A ⊇ A . . . A n − → A n A n ⊇ A A ← A A ⊆ A . . . A n − ← A n A n ⊆ A (cycle rule for n , C n ) Table 5: Cycle rules for ﬁnite implication A → B ∈ Σ , and black directed edges ( A, B ) , for B ⊆ A ∈ Σ . If G (Σ) hasred (black) directed edges from A to B and vice versa, then these edges arereplaced with an undirected edge between A and B . Given a multigraph G (Σ), we then topologically sort its strongly con-nected components which form a directed acyclic graph [37]. That is, eachcomponent is assigned a unique scc-number , greater than the scc-numbers ofall its descendants. For an attribute A , denote by scc( A ) the scc-number ofthe component node A belongs to. Note that scc( A ) ≥ scc( B ) if ( A, B ) is anedge in G (Σ). Denote also by scc i the set of attributes A with scc( A ) = i ,and let scc ≤ i := S j ≤ i scc j and deﬁne scc ≥ i , scc i analogously. Thefollowing lemma is a simple consequence of the deﬁnition. Lemma 20 ([16]) . Let Σ be a set of UFDs and UINDs that is closed under {F , F , U , U } ∪ {C k : k ∈ N } . Then every node in G (Σ) has a red and a black self-loop. The red (black)subgraph of G (Σ) is transitively closed. The subgraphs induced by the stronglyconnected components of G (Σ) are undirected. In each strongly connectedcomponent, the red (black) subset of undirected edges forms a collection ofnode-disjoint cliques. Note that the red and black partitions of nodes couldbe diﬀerent. Using this lemma we prove the following result which essentially showshow to construct a counterexample model for the ﬁnite implication problemof UFD+UIND+IA. The construction is somewhat intricate, mainly becauseof the requirements for balancing conditions (i) and (iii) against one another. Lemma 20 is a reformulation of Lemma 4.2. in [16] where the same claim is provedfor a set of FDs and UINDs that is closed under {F , F , F , U , U } ∪ {C k : k ∈ N } .We may omit F F emma 21. Let

Σ = Σ

UFD ∪ Σ UIND ∪ Σ IA be a set of UFDs, UINDs, andIAs that is closed under A ∗ ∪ {U , U } ∪ {C k : k ∈ N } and such that itcontains no CAs. Let , . . . , n be some scc-numbering of G (Σ UFD ∪ Σ UIND ) ,and let M , . . . , M n be a sequence of positive integers. Then there existsa ﬁnite relation r and a sequence of positive integers N , . . . , N n such that N i ≥ N i − + M i and (i) r | = A → B iﬀ A → B ∈ Σ , (ii) r | = X ⊥ Y iﬀ X ⊥ Y ∈ Σ , and (iii) | r ( A ) | = N i for A ∈ scc i where i ≥ .Proof. By Theorem 13 there is a ﬁnite relation r satisfying (i-ii). We showhow to extend r to a ﬁnite relation satisfying (i-iii). We construct induc-tively, for each i = 0 , . . . , n , a relation r i such that it satisﬁes (i-iii) with thealleviation that (iii) holds over A ∈ scc ≤ i .Without loss of generality Σ contains some CAs in which case scc consistsof all derivable constants and we set r = r and N = 1.For the induction step, assuming r i − satisﬁes the induction claim weshow how to deﬁne a relation r i satisfying the induction claim. First observethat we may rename the values of r i − such that for all attributes A , the set r i − ( A ) is the initial segment { , . . . , | r i − ( A ) |} of N . Note that satisfactionof typed dependencies is invariant under such renaming. Furthermore, thisrenaming can be done in such a way that all columns in the same maximalred clique are identical. In what follows we now construct r i .Let A + be the set of all B such that A → B ∈ Σ. First we deﬁne anauxiliary relation r whose tuples are obtained from those of r i − by replacingeach value | r i − ( A ) | of A ∈ scc i with any value from {| r i − ( A ) | , . . . , N i } , where N i is the maximun of max {| r i − ( A ) | : A ∈ scc i } and N i − + M i .Furthermore, we place the restriction that the same value must be pickedfor each attribute from the same maximal red clique. That is, we replace allmaximal attribute values at the i th scc level with some new value consistentlywith other attributes in the same maximal red clique, and the new maximalvalue is at least N i − + M i . We then deﬁne r i := { s t | t ∈ r } where s t mapseach attribute A to the tuple t ( A + ).Let us consider the items in the following.29 i) Let A → B ∈ Σ. Let s t ( A ) = s t ′ ( A ) for s t and s t ′ from r i . By deﬁnition t ( A + ) = t ′ ( A + ) which entails by B + ⊆ A + that t ( B + ) = t ′ ( B + ), andhence s t ( B ) = s t ′ ( B ).Suppose A → B / ∈ Σ. By induction assumption we ﬁnd t, t ′ from r i − which agree on A but disagree on B . Since r i − satisﬁes all UFDs of Σwe note that they agree on A + but disagree on B + . This means s t and s t ′ witness that A → B is false in r i . (ii) Suppose X ⊥ Y ∈ Σ. Without loss of generality X and Y are disjoint.Consider tuples s t , s t ′ in r i . By induction assumption we ﬁnd t ′′ from r i − which agrees with t on X and with t ′ on Y when restricted to onlythose attributes A ∈ XY ∩ scc i with values at most | r i − ( A ) | . Copyingthe remaining attribute values of X and Y from t and t ′ we obtain amodiﬁcation of t ′′ which agrees with t on X and with t ′ on Y . Since X and Y are closed under the UFDs of Σ, s t ′′ which agrees with s t on X and with s t ′ on Y .Suppose X ⊥ Y / ∈ Σ. By induction assumption X ⊥ Y is not satisﬁed in r i − , that is, we ﬁnd t and t ′ from r i − such that no t ′′ from r i − agreesboth on X with t and on Y with t ′ . Considering r i , ﬁnding a tuple s t ′′ which agrees on X with s t and on Y with s t ′ leads to a contradictionwith the previous statement. Thus X ⊥ Y remains not satisﬁed in r i . (iii) Since r i − is closed under the UFDs of Σ, it follows by induction as-sumption that | r i ( A ) | = | r i − ( A ) | = N i for A ∈ scc
The axiomatization A ∗ ∪{U , U }∪{C n : n ∈ N } is sound andcomplete for the ﬁnite implication problem of UFD+UIND+IA. Furthermore,any ﬁnite set of UFDs, UINDs, and IAs has a ﬁnite Armstrong relation with espect to ﬁnite implication.Proof. By Theorem 2 and by soundness of the cycle rule the axiomatizationis sound. To show completeness and existence of Armstrong relations, itsuﬃces to show that there is a ﬁnite Armstrong relation for any ﬁnite setΣ of UFDs, UINDs, and IAs that is closed under the axiomatization. Let r be a ﬁnite relation obtained by the previous lemma. By condition (iii) r satisﬁes also all UINDs of Σ and is thus a model of Σ. Without loss ofgenerality we have r ( A ) = { , . . . , N i } for each attribute A at level i of thescc numbering of G (Σ UIND ∪ Σ UFD ). We may assume N i ≥ N i − + M i where M i is the number of maximal black cliques at levels at most i . Consider thegraph G (Σ UIND ), i.e., the subgraph of G (Σ UIND ∪ Σ UFD ) obtained by removingall the red edges. We may deﬁne an scc-numbering scc ′ for G (Σ UIND ) suchthat scc ′ ( A ) ≤ scc ′ ( B ) implies scc( A ) ≤ scc( B ).We now modify r as follows. If there is a black edge from an attribute A to another attribute B , then on A replace N i − + scc ′ ( B ) with N + scc ′ ( B )where i = scc( A ). By transitivity of UINDs and since M i was chosen largeenough this is operation is well deﬁned. We claim that the relation r obtainedby these modiﬁcations is an Armstrong relation for Σ. Again, satisfaction oftyped dependencies is invariant under renaming of attribute values, and thus r is Armstrong with respect to all UFD and UIND consequences. Assumethen C ⊆ D / ∈ Σ. If scc( D ) < scc( C ), then this UIND is false because thecardinality of r ( C ) is strictly greater than that of r ( D ). Otherwise, there isno black egde from D to C but by reﬂexivity C has a black self-loop. Thus N + scc ′ ( C ) appears in r ( C ) but not in r ( D ). Suppose then C ⊆ D ∈ Σ,and let a ∈ r ( C ). If a is N + scc ′ ( B ) for some attribute B , then there is ablack edge from C to B and by transitivity from D to B which implies that N + scc ′ ( B ) appears in r ( D ) as well. Otherwise, a is not greater than N i forscc( C ) = i . Now either D and C are in the same maximal black clique orscc( D ) > i . In the ﬁrst case a must appear in r ( D ), and in the second case r ( D ) contains all positive integers from 1 to N i . We conclude that r is alsoArmstrong with respect to UIND consequences.

7. Polynomial-Time Conditions for Non-Interaction

Naturally, the implication problems for a combined class are more diﬃ-cult than the corresponding implication problems for the individual classes.While ﬁnite and unrestricted implication problems for FDs coincide and31re PTIME-complete and ﬁnite and unrestricted implication problems forINDs coincide and are PSPACE-complete, ﬁnite and unrestricted implica-tion problems for FDs and INDs deviate and are undecidable. However, theliterature has brought forward tractable conditions that are suﬃcient for thenon-interaction of these classes [44, 45]. That is, whenever these conditionsare met, then the implication for FDs and INDs by a given set of FDs andINDs can be determined by the restriction of the given set to FDs and INDsalone, respectively. Hence, non-interaction is a desirable property. In thissection we examine the frontiers for tractable reasoning about the classes ofFD+IA and IND+IA, respectively, in both the ﬁnite and unrestricted cases.The idea is to establish suﬃcient criteria for the non-interaction between IAsand FDs, and also between IAs and INDs. There is a trade-oﬀ between thesimplicity and generality of such criteria. While simple criteria may be easierto apply, more general criteria allow us to establish non-interaction in morecases. Our focus here is on generality, and the criteria are driven by thecorresponding inference rules. We deﬁne non-interaction between two classesas follows.

Deﬁnition 23.

Let Σ and Σ be two sets of dependencies from classes C and C , respectively. We say that Σ , Σ have no interaction with respect tounrestricted (ﬁnite) implication if • for σ from C , σ is (ﬁnitely) implied by Σ iﬀ σ is (ﬁnitely) implied by Σ ∪ Σ . • for σ from C , σ is (ﬁnitely) implied by Σ iﬀ σ is (ﬁnitely) implied by Σ ∪ Σ . Let us now deﬁne two syntactic criteria for describing non-interaction.We say that an IA X ⊥ Y splits an FD U → V (or an IND Z ⊆ W ) if both( X \ Y ) ∩ U and ( Y \ X ) ∩ U (or X ∩ W and Y ∩ W ) are non-empty. TheIA X ⊥ Y intersects U → V ( Z ⊆ W ) if XY ∩ U ( XY ∩ W ) is non-empty.Notice that both these concepts give rise to possible interaction betweentwo diﬀerent classes. We show that lacking splits implies non-interactionfor IND+IA, and for FD+IA in the unrestricted case. Non-interaction forFD+IA in the ﬁnite is guaranteed by the stronger condition in terms oflacking intersections.The proof for IND+IA is straightforward by using its complete axioma-tization. 32 heorem 24. Let Σ IND and Σ IA be respectively sets of INDs and IAs. If noIA in Σ IA splits any IND in Σ IND , then Σ IND and Σ IA have no interactionwith respect to unrestricted (ﬁnite) implication.Proof. Assume that σ is implied by Σ IND ∪ Σ IA (recall that here ﬁnite andunrestricted implication coincide). By Theorem 5, σ can be deduced fromΣ IND ∪ Σ IA by I ∪ B ∪ C . Given the condition, no rules from C can be appliedin the deduction. Since only rules in B (in I ) produce fresh INDs (IAs), theclaim follows.The non-interaction results for FD+IA require more work. For unre-stricted implication the idea is to ﬁrst apply the below polynomial-time al-gorithm which transforms an assumption set Σ to an equivalent set Σ ∗ . Theset Σ ∗ is such that it has no interaction between FDs and IAs provided thatnone of its FDs split any IAs. Substituting Σ for Σ ∗ the same holds for ﬁniteimplication.. These claims will be proven using a graphical version of thechase procedure.Let us ﬁrst consider the computation of Σ ∗ . For a set of FDs Σ we denoteby Cl(Σ , X ) the closure set of all attributes A for which Σ | = X → A . Thisset can be computed in linear time by the Beeri-Bernstein algorithm [6].The non-interaction condition for unrestricted implication is now formulatedusing Σ ∗ IA = { X ⊥ Y , . . . , X n ⊥ Y n } and Σ ∗ FD = Σ FD ∪{∅ → Z } where Z, X i Y i are computed using the following algorithm that takes an FD set Σ FD andan IA set Σ IA = { U ⊥ V , . . . , U n ⊥ V n } as an input. Algorithm 1

Algorithm for computing

Z, X i , Y i Require: Σ FD and Σ IA = { U i ⊥ V i | i = 1 , . . . , n } Ensure: Z and Σ ∗ IA = { X i ⊥ Y i | i = 1 , . . . , n } Initialize: V ← ∅ , X i ← U i , Y i ← V i repeat Z ← V for i = 1 , . . . , n do X i ← Cl(Σ FD , X i V ) Y i ← Cl(Σ FD , Y i V ) V ← V ∪ ( X i ∩ Y i ) until Z=VFrom the construction we obtain that Σ ∗ FD ∪ Σ ∗ IA is equivalent to Σ FD ∪ Σ IA and that 33 for Z ⊥ Z ∈ Σ ∗ IA and i = 1 ,

2, Σ ∗ FD | = Z i → X implies X ⊆ Z i ; (2) Σ ∗ FD ∪ Σ ∗ IA | = ∅ → A iﬀ A ∈ Z .Recall that the closure set C(Σ FD , X ) can be computed in linear time by theBeeri-Bernstein algorithm. Now, at stage 5 (or stage 6) the computation ofthe closure set is resumed whenever V introduces attributes that are new to X i ( Y i ). Since the number of the closures considered is 2 | Σ IA | , we obtain aquadratic time bound for the computation of Z, X i , Y i .Let us then present the chase construction for FD+IA. The idea is tochase via graphs constructed from vertices and undirected edges, labeled bysets of attributes (see also [29, 30]). Compared to the traditional tableauchase vertices now represent tuples and labeled edges represent equalitiesbetween tuple values. For unrestricted implication the obtained graphicalchase procedure is sound and complete; for ﬁnite implication it is sound butincomplete. Deﬁnition 25.

Let Σ FD ∪ Σ IA ∪ { σ } be a set of FDs and IAs over a relationschema R , and let Σ = Σ FD ∪ Σ IA . Let G be a graph that consists of twovertices v , v and a single edge ( v , v ) that is labeled by { A ∈ R : Σ FD ∪{ X ⊥ X : XY ⊥ XZ ∈ Σ IA } | = Y → A } where Y is U if σ is U → V andotherwise Y is ∅ . For an attribute set X , vertices v and v ′ are X -connected if v = v ′ or for all A ∈ X there is a sequence of edges ( v, v ) , . . . , ( v n , v ′ ) , eachlabeled by a set including A . We then denote by G Σ ,σ any (possibly inﬁnite)undirected edge-labeled graph that is obtained by chasing G with the followingrule as long as possible. • Assume that X ⊥ Y ∈ Σ and v, v ′ are two vertices such that no vertex v ′′ is X -connected to v and Y -connected to v ′ . Then apply (1) once,and thereafter apply (2) as long as possible. (1) Add fresh vertex v new and fresh edges ( v, v new ) and ( v ′ , v new ) labeledrespectively by X and Y . (2) For all X ′ → Y ′ ∈ Σ and v, v ′ ∈ G that are X ′ -connected but not Y ′ -connected, add a fresh edge ( v, v ′ ) labeled by Y ′ . It is easy to describe non-trivial interaction between FDs and IAs usingthe above graph construction. Let us illustrate this with an example. Notethat in the following example both FDs split IAs from the same assumptionset. As a result of this, the interaction between FDs and IAs is already sointricate that all consequences are not derivable using the rules in Table 2.34 v v v v A C D B A E D B C A D E Figure 1: Start for G Σ ,σ -construction Example 26.

Let Σ be the set { B ⊥ CD, D ⊥ AE, BC ⊥ ADE, AB → X, CDE → X } and deﬁne σ as A → X . Then the basis for G Σ ,σ consists of two vertices v , v that are connected by an A -labeled edge. The graph depicted in Fig. 26is obtained by applying once each IA in Σ . Notice that v is AB -connectedto v and CDE -connected to v . By the FDs in Σ , the next step would be toadd X -labeled edges ( v , v ) and ( v , v ) . Since v and v are X -connected in our example G Σ ,σ , by the followinglemma we obtain that Σ | = A → X . Interestingly, A → X is not derivablefrom Σ by the rules depicted in Table 2. Lemma 27.

Let Σ ∪ { σ } be a set of FDs and IAs. Then the following holds: (1) if σ is U → V , then Σ | = σ iﬀ v is V -connected to v in G Σ ,σ ; (2) if σ is U ⊥ V , then Σ | = σ iﬀ there exists v that is U -connected to v and V -connected to v in G Σ ,σ .Proof. Assume ﬁrst that the right-hand side condition holds; we show howto prove Σ | = σ . Let r be a relation satisfying Σ, and t, t ′ ∈ r be two tuplesthat agree on U in case (1) and are arbitrary in case (2). It is then an easyinduction to show that { ( v , t ) , ( v , t ′ ) } can be extended to a mapping f fromvertices to attributes such that f ( v )( X ) = f ( v ′ )( X ) whenever v and v ′ are X -connected in G Σ ,σ . It follows that r | = σ which shows that Σ | = σ .Assume then that the righ-hand side condition fails. Then for each vertex v in G Σ ,σ , construct a tuple t that maps each attribute A to the set that con-sists of all vertices v ′ in G Σ ,σ that are A -connected to v . It is straightforward35o show how this construction gives rise to a (possibly inﬁnite) counter-example for Σ | = σ . This completes the proof of the lemma.We are now ready to state the non-interaction theorem for FD+IA. Theorem 28.

Let Σ FD and Σ IA be respectively sets of FDs and IAs, andlet Σ ∗ FD and Σ ∗ IA be obtained from Σ FD and Σ IA by Algorithm 1. Then thefollowing holds: • if no IA in Σ ∗ IA splits any FD in Σ ∗ FD , then Σ ∗ FD and Σ ∗ IA have nointeraction with respect to unrestricted implication; • if no IA in Σ IA intersects any FD in Σ FD , then Σ FD and Σ IA have nointeraction with respect to ﬁnite implication.Proof. We consider ﬁrst unrestricted implication and then ﬁnite implication.

Unrestricted case.

Assume that Σ | = σ where Σ = Σ ∗ FD ∪ Σ ∗ IA , and assumethat no IA in Σ ∗ IA splits any FD in Σ ∗ FD ; we show that Σ ∗ IA | = σ or Σ ∗ FD | = σ holds. Let G Σ ,σ be as in Deﬁnition 25; we show that the construction of G Σ ,σ does not use any application of (2). Assume to the contrary that someintroduction of a new vertex v new and new edges ( v, v new ) , ( v ′ , v new ) by (1)renders for the ﬁrst time some vertices w and w ′ X ′ -connected but not Y ′ -connected, for some X ′ → Y ′ ∈ Σ. Assume that the new edges ( v, v new )and ( v ′ , v new ) are respectively labeled by X and Y in which case we have X ⊥ Y ∈ Σ, and assume ﬁrst that w ′ is v new . Since X ⊥ Y does not split X ′ → Y ′ , it must be the case that X ′ is included in either X or Y . We mayassume by symmetry that this holds for X in which case X ′ Y ′ ⊆ X by (i).Then it follows that w and v are X ′ -connected but not Y ′ -connected sincethe same holds for w and v new by the assumptions. Now w = v new , and hencethis contradicts Deﬁnition 25 if v new is the ﬁrst added vertex, and otherwisethe assumption that (2) has not been applied previously. Hence, w ′ and v new are diﬀerent vertices, and so by symmetry are w and v new . However, since X and Y share only constant attributes, w and w ′ must have been X ′ -connectedalready before the introduction of v new . Hence by Deﬁnition 25 they havealso been Y ′ -connected which contradicts the assumption. This completesthe proof that no application of (2) occurs in the construction of G Σ ,σ .Assume that σ is an IA. By the construction of Σ ∗ IA , ( v , v ) has the samelabel in the initial graph G of G Σ ,σ and G Σ ∗ IA ,σ . Since the construction of G Σ ,σ contains no application of (2), G Σ ,σ equals G Σ ∗ IA ,σ . Hence by Lemma 27,Σ ∗ IA | = σ . Assume then that σ is an FD. Since the graph construction of G Σ ,σ v , v ), we obtain that Σ ∗ FD ∪ { X ⊥ X : XY ⊥ XZ ∈ Σ ∗ IA } | = σ by Deﬁnition 25 and Lemma 27. Therefore, and since Z satisﬁes (ii) and ∅ → Z ∈ Σ ∗ FD , it follows that Σ ∗ FD | = σ . Finite case.

Assume that no IA in Σ IA intersects any FD in Σ FD , andassume ﬁrst that Σ FD = ﬁn U → V ; we show that Σ FD ∪ Σ IA = ﬁn U → V .Let U + be the set of attributes A such that Σ FD | = ﬁn U → A , and let I bethe set of attributes that appear in Σ IA . Then we let r be a relation where r ( U + ) is U + { } , r ( I \ U + ) is I \ U + { , } , and r ( A ) takes values 1 , . . . , | U + \ I | for A ∈ R \ ( U + ∪ I ), where R is the underlying relation schema. Clearly r satisﬁes Σ IA and violates U → V . If X → Y ∈ Σ FD , then by the assumption X ∩ I = ∅ , and hence it follows by the construction that r satisﬁes X → Y .Assume then that Σ IA = ﬁn U ⊥ V ; we show that Σ FD ∪ Σ IA = ﬁn U ⊥ V .Let I be as in the previous paragraph. If U V ⊆ I , let r be a ﬁnite relation over I satisfying Σ IA and violating U ⊥ V ; otherwise deﬁne r as I { , } . Then let r ′ be the extension of r where, for each A ∈ R , r ′ ( A ) takes values 1 , . . . , | r | .Given the non-interaction assumption, we notice that r ′ is a witness of Σ FD ∪ Σ IA = ﬁn U ⊥ V . This completes the proof.Note that the ﬁnite relation constructions in the proof are possible alreadyfrom the assumptions that Σ FD = U → V or Σ IA = U ⊥ V ; for the latterrecall that ﬁnite and unrestricted implication coincide for IAs. Hence, theproof entails that the ﬁnite and unrestricted implication problems coincidefor FD+IA provided that the non-intersection assumption holds.To illustrate the necessity for a stronger condition in the ﬁnite case, recallfrom Section 5.1 that AB → CD is ﬁnitely implied by { A ⊥ B, C ⊥ D, BC → AD, AD → BC } , and notice that AB → CD is not ﬁnitely implied by { BC → AD, AD → BC } . However, Algorithm 1 does not produce anyfresh assumptions, and neither A ⊥ B nor C ⊥ D splits any FD assumption.Therefore, lackness of splits is not suﬃcient for non-interaction in the ﬁnitecase.

8. Complexity Results

In this section we examine the computational complexity of the variousimplication problems we have studied. We ﬁrst show that both implicationproblems for UFD+UIND+IA can be solved in low-degree polynomial time,even though the problems diﬀer from one another. Then we focus on the classIND+IA for which the two implication problems coincide. Recall that the37ombination of FDs and INDs is undecidable with regards to their implicationproblem. However, the same cannot be true for the combination of INDsand IAs, as already witnessed by our ﬁnite axiomatization. In this sectionwe proceed even further by showing that adding IAs to the class of INDsinvolves no trade-oﬀ in terms of losing desirable computational properties.Indeed, Theorem 33 shows that, alike for INDs [13], the implication problemfor IND+IA is

PSPACE -complete. We start by analyzing the complexity ofUFD+UIND+IA implication.

Theorem 29.

Let Σ UFD , Σ UIND , Σ IA be respectively sets of UFDs, UINDs,and IAs over a relation schema R . The unrestricted and ﬁnite implicationproblems for σ by Σ UFD ∪ Σ UIND ∪ Σ IA can be decided in time: • O ( | Σ IA | · | Σ UFD | + | Σ UIND | ) if σ is an UFD or UIND; • O ( | Σ IA | · ( | Σ UFD | + | R | ) + | Σ UIND | ) if σ is a IA.Proof. Algorithm 2 extends an algorithm for UFD+UIND-implication in [13].It generates a graph G and sets Z, X i , Y i , for i = 1 , . . . , | Σ IA | , and takesΣ UFD , Σ UIND and Σ IA = { U ⊥ V , . . . , U n ⊥ V n } as an input. Note that steps2-4 are to be omitted in the unrestricted case.Let Σ := Σ UFD ∪ Σ UIND ∪ Σ IA . Let Σ ∗ be the set of dependencies thatconsists of all trivial UFDs, UINDs, and IAs over R , and: (i) A → B iﬀ B ∈ Z or A is connected to B by a red path; (ii) A ⊆ B iﬀ there is a black path from B to A ; (iii) X ⊥ Y iﬀ Σ ∗ IA ⊢ I X ⊥ Y ;where Σ ∗ IA := { X i ⊥ Y i | i = 1 , . . . , n } ∪ { Z ⊥ Z } .Let us ﬁrst consider the case for ﬁnite implication. In what follows, weshow that Σ | = ﬁn σ iﬀ σ ∈ Σ ∗ . By Theorem 22, Σ | = ﬁn σ iﬀ σ can be deducedfrom Σ by rules A ∗ ∪ {U , U } ∪ {C k : k ∈ N } . Therefore, the claim followsif Σ ∗ is the deductive closure of Σ under these rules. It is straightforward tocheck that each dependency in Σ ∗ can be deduced by the rules. Note thatitem 3 of the graph construction can be simulated with the cycle rules andtransitivity rules for FDs and INDs. Next we show that Σ ∗ is deductivelyclosed. The only non-trivial cases are the cycle rules and F I , F I

2. For

F I X ⊥ Y A, A → B ∈ Σ ∗ ; we show that X ⊥ Y AB ∈ Σ ∗ . The caseswhere A or B is empty are trivial. Assume that both are single attributes. If B ∈ Z , then the claim follows by the deﬁnition of Σ ∗ IA . Otherwise, B Z andthere exists a red path from A to B . Since Z is closed under red arrows, this38 lgorithm 2 Algorithm for computing

Z, X i , Y i Require: Σ UFD , Σ

UIND , and Σ IA = { U i ⊥ V i | i = 1 , . . . , n } Ensure:

A digraph G and sets Z and X i , Y i for i = 1 , . . . , n Initialize:

Z, X i , Y i are empty and G is a digraph that consists of vertices R , red edges ( A, B ) for A → B ∈ Σ UFD , and black edges (

A, B ) for B ⊆ A ∈ Σ UIND compute all strongly connected components of G ( only in the ﬁnite case ) for each red (black) edge ( A, B ) with

A, B in the same component ( onlyin the ﬁnite case ) do add red (black) edge ( B, A ) to G for ∅ → A ∈ Σ FD do add A to Z for n = 1 , . . . , n do add A to X i ( Y i ) if there is a red path from X i ( Y i ) to A add each attribute in X i ∩ Y i to Z add to Z all attributes that are reachable from Z in G , ignoring edgecolors for each red (black) edge ( A, B ) with

A, B in Z do add red (black) edge ( B, A ) to G path stays outside Z and has hence existed already at step 7. Therefore bythe construction B is in X i ( Y i ) whenever A is. It is now an easy inductionon the length of a deduction to show that whenever Σ ∗ IA ⊢ I V ⊥ V , thenΣ ∗ IA ⊢ I V ′ ⊥ V ′ where V ′ i = V i B if A ∈ V i , and otherwise V ′ i = V i . From thisit follows that Σ IA ⊢ I X ⊥ Y AB , and therefore X ⊥ Y AB ∈ Σ ∗ . For F I | = σ iﬀ σ ∈ Σ ∗ , where Σ ∗ is now deﬁned over graph G obtained from steps1,5-12 of the algorithm. Proving this is analogous to the ﬁnite case (withthe exception that rules U I , U I , , , ,

11 each take time O ( | Σ UFD | + | Σ UIND | ), step 5 takes O ( | Σ UFD | ), and step 7 takes O ( | Σ IA | ·| Σ UFD | ). Since, reachibility can be tested in linear time, we obtain the timebound for a UFD or a UIND σ . Assume that σ is an IA. It easy to see that σ

39s (ﬁnitely) implied by Σ ∗ IA iﬀ σ ↾ R ′ is (ﬁnitely) implied by Σ ∗ IA ↾ R ′ , where R ′ := R \ Z . On the other hand, by Theorem 2 in [40] the right-hand sideimplication problem for disjoint independence atoms can be decided in time O ( | Σ ∗ IA ↾ R ′ | · | R ′ | ). Since, Σ ∗ IA ↾ R ′ is of the size of Σ IA , the time bound foran IA σ follows.It follows that all unary INDs and constancy atoms implied by a set ofINDs and IAs can be recognized in linear time. Theorem 30.

The unrestricted and ﬁnite implication problems for the classCA+UIND by IND+IA is linear-time decidable.Proof.

Let Σ be a set of INDs and IAs, and let σ be a UIND and τ a CA.By Theorem 9, Σ | = ρ ⇔ Σ UIND ∪ Σ CA | = ρ , for ρ ∈ { σ, τ } . Let Y bethe singlevalued span of Σ UIND ∪ Σ CA described in Deﬁnition 15, and letΣ := Σ UIND ∪ { A ⊆ B : B ⊆ A ∈ Σ UIND , A ∈ Y } and Σ := Σ CA ∪ {∅ → A : A ∈ Y } . By Theorem 17, • Σ UIND ∪ Σ CA | = σ ⇔ Σ | = σ , • Σ UIND ∪ Σ CA | = τ ⇔ Σ | = τ .The singlevalued span Y and the deductive closure of Σ can be computedin linear time by reducing to graph reachability. For the latter, note that U , U | = ∅ → A iﬀ A ∈ Y . We conclude that in both cases implication can be tested in lineartime.Next we turn to the class IND+IA and use graphs again to characterizethe associated implication problem. Recall by Corollary 7 that the unre-stricted and ﬁnite implication problems coincide for IND+IA. Deﬁnition 31.

Let Σ ∪ { σ } be a set of INDs and IAs, and assume that σ is of the form R [ X ] ⊆ S [ Y ] (or R [ X ⊥ X ] where X = X X ). Then we let H Σ ,σ be a graph that has nodes { τ , . . . , τ k } , for k ≤ | X | , where τ i is an INDof the form R [ A i ] ⊆ R ′ [ B i ] and the concatenation A . . . A n is a permutation(without repetition) of X . Two nodes v, v ′ are connected by a directed edge v → v ′ if one of the following holds: • There exists τ , τ , τ such that v \ { τ } = v ′ \ { τ , τ } where τ is apermutation of R [ U U ] ⊆ R ′ [ V V ] , τ = R [ U ] ⊆ R ′ [ V ] , τ = R [ U ] ⊆ R ′ [ V ] . There exists τ , τ , τ such that v \ { τ , τ } = v ′ \ { τ } where τ = R [ U ] ⊆ R ′ [ V ] , τ = R [ U ] ⊆ R ′ [ V ] , τ = R [ U U ] ⊆ R ′ [ V V ] , and for some W ⊇ V and W ⊇ V , R ′ [ W ⊥ W ] ∈ Σ . • There exists τ, τ ′ such that v \ { τ } = v ′ \ { τ ′ } where τ = R [ U ] ⊆ R ′ [ V ] , τ ′ = R [ U ] ⊆ R ′′ [ W ] , and R ′ [ V ] ⊆ R ′′ [ W ] is a projection andpermutation of some IND in Σ .If σ is R [ X ⊥ X ] , then we deﬁne v start := { R [ X ] ⊆ R [ X ] , R [ X ] ⊆ R [ X ] } and v end := { R [ X ] ⊆ R [ X ] } . If σ is R [ X ] ⊆ S [ Y ] , then v start := { R [ X ] ⊆ R [ X ] } and v end := { R [ X ] ⊆ S [ Y ] } . Lemma 32.

Let Σ ∪ { σ } be a set of INDs and DIAs. Then Σ | = σ iﬀ H Σ ,σ contains a directed path from v start to v end .Proof. Assuming Σ | = σ , the required path is found by backtracking a suc-cesful chase of d by Σ where the chase rules and d are deﬁned as in cases 2)and 3) in the proof of Theorem 5.For the other direction, let d be a database satisfying Σ. Assume ﬁrstthat σ is an IA of the form R [ X ⊥ X ], and let t, t ′ ∈ r [ R ]. Now X and X are disjoint, so we can deﬁne a mapping t that agrees with t on X andwith t ′ on X . It suﬃces to show, given a directed path in H Σ ,σ from v start to { τ , . . . , τ k } , that for each τ i of the form R [ U i ] ⊆ R ′ [ V i ] there is t i ∈ r ′ [ R ′ ]such that t ( U i ) = t i ( V i ). Since this is a straightforward induction, we leavethe proof to the reader. The case where σ is an IND is analogous. PSPACE -completeness of the IND+IA-implication is now showed by re-ducing to graph reachability in H Σ ,σ . Theorem 33.

The unrestricted (ﬁnite) implication problem for IND+IA iscomplete for

PSPACE .Proof.

The lower bound follows by the fact that the implication problem forINDs alone is

PSPACE -complete [13]. For the upper bound, let Σ ∪ σ be a setof INDs and IAs where σ is an IA (or an IND). Construct ﬁrst Σ ∪ { σ } (Σ )as described in Lemma 3. By Theorem 30 this can be done in polynomialtime. Then non-deterministically check whether there is a directed path in H Σ ,σ ( H Σ ,σ ) from v start to v end . Since this requires only polynomial amountof space, we conclude by Savitch’ theorem that the implication problem is in PSPACE . 41ote that there are only polynomially many nodes in H Σ ,σ , given that σ is of ﬁxed arity. Hence, by Lemma 3 and Theorem 30 we obtain the followingcorollary. Corollary 34.

The unrestricted (ﬁnite) implication problem for σ by Σ ,where Σ ∪ σ is a set of INDs and IAs, is ﬁxed-parameter tractable in thearity of σ . Actually, the choice of the parameter in the above corollary is not optimal.By Theorem 30, tractability is preserved if only the number of non-constantattributes in an IA σ is ﬁxed. Moreover, assume that σ is an IND of theform A . . . A h + k ⊆ B . . . B h + k where B i is constant for i ≤ h . Then theimplication problem for σ is ﬁxed-parameter tractable in k since by U I A h +1 . . . A h + k ⊆ B h +1 . . . B h + k and A i ⊆ B i , for i ≤ h ,are all implied.

9. Conclusion and Outlook

In view of the infeasibility of EMVDs and of FDs and INDs combined, theclass of FDs, MVDs and unary INDs is important as it is low-degree poly-nomial time decidable in the ﬁnite and unrestricted cases. As independenceatoms form an important tractable embedded subclass of EMVDs, we havedelineated axiomatisability and tractability frontiers for subclasses of FDs,INDs, and IAs. The most interesting class is that of IAs, unary FDs andunary INDs, for which ﬁnite and unrestricted implication diﬀer but each isaxiomatisable and decidable in low-degree polynomial time. The subclass isrobust with this properties as unary functional and binary inclusion depen-dencies are undecidable in both the ﬁnite and unrestricted case, and binaryfunctional dependencies with unary independence atoms are not ﬁnitely ax-iomatisable in the ﬁnite. The results form the basis for new applications ofthese data dependencies in many data processing tasks.Even though research in this space has been rich and deep, there aremany problems that warrant future research. Theoretically, the decidabilityremains open for both independence atoms and functional dependencies aswell as unary independence atoms and functional dependencies, both in theﬁnite and unrestricted case. This line of research should also be investigatedin the probabilistic setting of conditional independencies, fundamental tomultivariate statistics and machine learning. Practically, implementationsand experimental evaluations of the algorithms can complement the ﬁndings42n the research. It would be interesting to investigate applications. Forexample, in consistent query answering the aim is to return all those answersto a query that are present in all repairs [8, 41]. Here, a repair is a databaseobtained by applying a minimal set of operations that resolve all violationsof the given database with respect to the given constraints). Assuming thatthe constraint is the IA σ = Heart : p name ⊥ t id , operations are tupleinsertions (appropriate for tuple-generating dependencies), and the query isSELECT p name, t id FROM Heart ;a rewriting of this query that would return the consistent query answers overthe given database is SELECT H.p name, H’.t idFROM

Heart H, Heart

H’ ;It would be interesting to include IAs when investigating typical problemsof consistent query answering [8]. In database design [25, 39] IAs are usefulfor ﬁnding lossless decompositions of a database schema, which is exem-pliﬁed in [17]. Considering the infeasibility of the implication problem forEMVDs, and the limited knowledge on the interaction of IA+FD, it wouldbe interesting to investigate the possibilities for automating schema designusing subclasses of IA+FD+IND. Query folding has greatly beneﬁted fromconsidering FD+IND [28]. As the implication problems for FD+IND areundecidable, there is no algorithm that can produce a complete list of queryrewritings for this class. However, our results are a starting point to develop(complete) query folding algorithms for subclasses of IA+FD+IND. Simi-larly, well-known methods for deciding query containment in the presenceof FD+IND [36] could be extended to subclasses of IA+FD+IND. Queryanswering has also been shown to be eﬀective on data integration systemsin which global schemata are expressed by key and foreign key constraints[11]. It would be interesting to see how these techniques can be extended toother subclasses of IA+FD+IND. Another area of impact for our results isontology-based data access, in which keys and inclusion dependencies playan important role. It would be interesting to study to which degree inde-pendence atoms can be added without increasing too much the complexityof associated decision problems [24]. It is interesting to develop algorithmsthat discover all those IAs that hold on a given database [1]. For subclassesof IA+FD+IND, our algorithms can remove data dependencies implied byothers. 43 eferences [1] Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. Proﬁling rela-tional data: a survey.

VLDB J. , 24(4):557–581, 2015.[2] Serge Abiteboul, Richard Hull, and Victor Vianu.

Foundations ofDatabases . Addison-Wesley, 1995.[3] Samson Abramsky. Contextual semantics: From quantum mechanics tologic, databases, constraints, and complexity.

Bulletin of the EATCS ,113, 2014.[4] Samson Abramsky, Georg Gottlob, and Phokion G. Kolaitis. Robustconstraint satisfaction and local hidden variables in quantum mechanics.In

IJCAI , 2013.[5] William W. Armstrong. Dependency Structures of Data Base Relation-ships. In

Proc. of IFIP World Computer Congress , pages 580–583, 1974.[6] Catriel Beeri and Philip A. Bernstein. Computational problems relatedto the design of normal form relational schemas.

ACM Trans. DatabaseSyst. , 4(1):30–59, 1979.[7] Catriel Beeri, Ronald Fagin, and John H. Howard. A complete ax-iomatization for functional and multivalued dependencies in databaserelations. In

SIGMOD , pages 47–61, 1977.[8] Leopoldo E. Bertossi.

Database Repairing and Consistent Query An-swering . Morgan & Claypool Publishers, 2011.[9] Joachim Biskup and Piero A. Bonatti. Controlled query evaluation forenforcing conﬁdentiality in complete information systems.

Int. J. Inf.Sec. , 3(1):14–27, 2004.[10] Jean-Marc Cadiou. On semantic issues in the relational model of data.In

MFCS , pages 23–38, 1976.[11] Andrea Cal`ı, Diego Calvanese, and Maurizio Lenzerini. Data integrationunder integrity constraints. In

Seminal Contributions to InformationSystems Engineering , pages 335–352. 2013.4412] Marco A. Casanova, Ronald Fagin, and Christos H. Papadimitriou. In-clusion dependencies and their interaction with functional dependencies.In

PODS , pages 171–176, 1982.[13] Marco A. Casanova, Ronald Fagin, and Christos H. Papadimitriou. In-clusion dependencies and their interaction with functional dependencies.

J. Comput. Syst. Sci. , 28(1):29–59, 1984.[14] Ashok K. Chandra and Moshe Y. Vardi. The implication problem forfunctional and inclusion dependencies is undecidable.

SIAM Journal onComputing , 14(3):671–677, 1985.[15] E. F. Codd. Relational completeness of data base sublanguages.

In: R.Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM ResearchReport RJ 987, San Jose, California , 1972.[16] Stavros S. Cosmadakis, Paris C. Kanellakis, and Moshe Y. Vardi.Polynomial-time implication problems for unary inclusion dependencies.

J. ACM , 37(1):15–46, 1990.[17] Claude Delobel. Normalization and hierarchical dependencies in therelational data model.

ACM Trans. Database Syst. , 3(3):201–222, 1978.[18] Alin Deutsch, Lucian Popa, and Val Tannen. Physical data indepen-dence, constraints, and optimization with universal plans. In

VLDB ,pages 459–470, 1999.[19] Ronald Fagin. Multivalued dependencies and a new normal form forrelational databases.

ACM Transactions on Database Systems , 2:262–278, September 1977.[20] Ronald Fagin. Horn clauses and database dependencies.

J. ACM ,29(4):952–985, 1982.[21] Ronald Fagin, Phokion G. Kolaitis, Ren´ee J. Miller, and Lucian Popa.Data exchange: semantics and query answering.

Theor. Comput. Sci. ,336(1):89–124, 2005.[22] Zvi Galil. An almost linear-time algorithm for computing a dependencybasis in a relational database.

J. ACM , 29(1):96–102, 1982.4523] Dan Geiger, Azaria Paz, and Judea Pearl. Axioms and algorithms forinferences involving probabilistic independence.

Information and Com-putation , 91(1):128–141, 1991.[24] Georg Gottlob, Michael Morak, and Andreas Pieris. Recent advances indatalog +/-. In

Reasoning Web , pages 193–217, 2015.[25] Georg Gottlob, Reinhard Pichler, and Fang Wei. Tractable databasedesign and datalog abduction through bounded treewidth.

Inf. Syst. ,35(3):278–298, 2010.[26] Erich Gr¨adel and Jouko A. V¨a¨an¨anen. Dependence and independence.

Studia Logica , 101(2):399–410, 2013.[27] Sergio Greco, Cristian Molinaro, and Francesca Spezzano.

IncompleteData and Data Dependencies in Relational Databases . Synthesis Lec-tures on Data Management. Morgan & Claypool Publishers, 2012.[28] Jarek Gryz. Query rewriting using views in the presence of functionaland inclusion dependencies.

Inf. Syst. , 24(7):597–612, 1999.[29] Miika Hannula and Juha Kontinen. A ﬁnite axiomatization of condi-tional independence and inclusion dependencies. In

FoIKS , pages 211–229, 2014.[30] Miika Hannula and Juha Kontinen. A ﬁnite axiomatization of condi-tional independence and inclusion dependencies.

Inf. Comput. , 249:121–137, 2016.[31] Miika Hannula, Juha Kontinen, and Sebastian Link. On independenceatoms and keys. In

CIKM , pages 1229–1238, 2014.[32] Miika Hannula, Juha Kontinen, and Sebastian Link. On independenceatoms and keys. In Jianzhong Li, Xiaoyang Sean Wang, Minos N. Garo-falakis, Ian Soboroﬀ, Torsten Suel, and Min Wang, editors,

Proceed-ings of the 23rd ACM International Conference on Conference on In-formation and Knowledge Management, CIKM 2014, Shanghai, China,November 3-7, 2014 , pages 1229–1238. ACM, 2014.[33] Miika Hannula, Juha Kontinen, and Sebastian Link. On the ﬁnite andgeneral implication problems of independence atoms and keys.

J. Com-put. Syst. Sci. , 82(5):856–877, 2016.4634] Christian Herrmann. On the undecidability of implications betweenembedded multivalued database dependencies.

Information and Com-putation , 122(2):221 – 235, 1995.[35] Christian Herrmann. Corrigendum to ”On the undecidability of impli-cations between embedded multivalued database dependencies”.

Inf.Comput. , 204(12):1847–1851, 2006.[36] David S. Johnson and Anthony C. Klug. Testing containment of con-junctive queries under functional and inclusion dependencies.

J. Com-put. Syst. Sci. , 28(1):167–189, 1984.[37] A. B. Kahn. Topological sorting of large networks.

Commun. ACM ,5(11):558–562, November 1962.[38] Paris C. Kanellakis. Elements of relational database theory. In

Handbookof Theoretical Computer Science , pages 1073–1156. 1990.[39] Henning K¨ohler and Sebastian Link. SQL schema design: Foundations,normal forms, and normalization. In

SIGMOD , 2016.[40] Juha Kontinen, Sebastian Link, and Jouko A. V¨a¨an¨anen. Independencein database relations. In

WoLLIC , pages 179–193, 2013.[41] Paraschos Koutris and Jef Wijsen. The data complexity of consistentquery answering for self-join-free conjunctive queries under primary keyconstraints. In

PODS , pages 17–29, 2015.[42] Warren-Dean Langeveldt and Sebastian Link. Empirical evidence forthe usefulness of armstrong relations in the acquisition of meaningfulfunctional dependencies.

Inf. Syst. , 35(3):352–374, 2010.[43] Dirk Leinders and Jan Van den Bussche. On the complexity of divisionand set joins in the relational algebra. In

PODS , pages 76–83, 2005.[44] Mark Levene and George Loizou. How to prevent interaction of func-tional and inclusion dependencies.

Inf. Process. Lett. , 71(3-4):115–125,1999.[45] Mark Levene and George Loizou. Guaranteeing no interaction betweenfunctional dependencies and tree-like inclusion dependencies.

Theor.Comput. Sci. , 254(1-2):683–690, 2001.4746] Heikki Mannila and Kari-Jouko R¨aih¨a. Design by example: An ap-plication of armstrong relations.

J. Comput. Syst. Sci. , 33(2):126–141,1986.[47] Mozhgan Memari and Sebastian Link. Index design for enforcing partialreferential integrity eﬃciently. In

Proceedings of the 18th InternationalConference on Extending Database Technology, EDBT 2015, Brussels,Belgium, March 23-27, 2015. , pages 217–228, 2015.[48] John C. Mitchell. The implication problem for functional and inclusiondependencies.

Information and Control , 56(3):154–173, 1983.[49] John C. Mitchell. Inference rules for functional and inclusion dependen-cies. In

PODS , pages 58–69, 1983.[50] Dan Olteanu and Jakub Z´avodn´y. Size bounds for factorised represen-tations of query results.

ACM Trans. Database Syst. , 40(1):2, 2015.[51] Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert,Jan-Peer Rudolph, Martin Sch¨onberg, Jakob Zwiener, and Felix Nau-mann. Functional dependency discovery: An experimental evaluation ofseven algorithms.

PVLDB , 8(10):1082–1093, 2015.[52] Jan Paredaens. The interaction of integrity constraints in an informationsystem.

J. Comput. Syst. Sci. , 20(3):310–329, 1980.[53] S. V. Petrov. Finite axiomatization of languages for representations ofsystem properties: Axiomatization of dependencies.

Inf. Sci. , 47(3):339–372, April 1989.[54] Viswanath Poosala and Yannis E. Ioannidis. Selectivity estimation with-out the attribute value independence assumption. In Matthias Jarke,Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, PericlesLoucopoulos, and Manfred A. Jeusfeld, editors,

VLDB’97, Proceedingsof 23rd International Conference on Very Large Data Bases, August25-29, 1997, Athens, Greece , pages 486–495. Morgan Kaufmann, 1997.[55] Douglas Stott Parker Jr. and Kamran Parsaye-Ghomi. Inferences involv-ing embedded multivalued dependencies and transitive dependencies. In

SIGMOD , pages 52–57, 1980. 4856] B Thalheim.

Dependencies in relational databases . Teubner, 1991.[57] Ziheng Wei and Sebastian Link. A fourth normal form for uncertaindata. In Paolo Giorgini and Barbara Weber, editors,

Advanced Informa-tion Systems Engineering - 31st International Conference, CAiSE 2019,Rome, Italy, June 3-7, 2019, Proceedings , volume 11483 of

Related Researches

A GeoSPARQL Compliance Benchmark

by Milos Jovanovik

Silentium! Run-Analyse-Eradicate the Noise out of the DB/OS Stack

by Wolfgang Mauerer

Explaining Inference Queries with Bayesian Optimization

by Brandon Lockhart

Secrecy: Secure collaborative analytics on secret-shared data

by John Liagouris

FlashP: An Analytical Pipeline for Real-time Forecasting of Time-Series Relational Data

by Shuyuan Yan

Approximate Query Processing for Group-By Queries based on Conditional Generative Models

by Meifan Zhang

Answer Counting under Guarded TGDs

by Cristina Feier

Controlling Entity Integrity with Key Sets

by Miika Hannula

Dataset Definition Standard (DDS)

by Cyril Cappi

An Algorithm for the Discovery of Independence from Data

by Miika Hannula

Efficient Discovery of Approximate Order Dependencies

by Reza Karegar

Bridging BAD Islands: Declarative Data Sharing at Scale

by Xikui Wang

Exploring Data and Knowledge combined Anomaly Explanation of Multivariate Industrial Data

by Xiaoou Ding

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

by Hai Lan

Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL

by William F Godoy

Searching Personalized k -wing in Large and Dynamic Bipartite Graphs

by Aman Abidi

A Pluggable Learned Index Method via Sampling and Gap Insertion

by Yaliang Li

sGrapp: Butterfly Approximation in Streaming Graphs

by Aida Sheshbolouki

Distributed Spatial-Keyword kNN Monitoring for Location-aware Pub/Sub

by Shohei Tsuruoka

Beyond Equi-joins: Ranking, Enumeration and Factorization

by Nikolaos Tziavelis

Probabilistic Data with Continuous Distributions

by Martin Grohe

Evaluating Complex Queries on Streaming Graphs

by Anil Pacaci

Alaska: A Flexible Benchmark for Data Integration Tasks

by Valter Crescenzi

Shift-Table: A Low-latency Learned Index for Range Queries using Model Correction

by Ali Hadian

Multi-attributed Community Search in Road-social Networks

by Fangda Guo

«
1

2

3

4

»

Submitted on 7 Jan 2021 Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar