[PDF] Conditional probability logic, lifted bayesian networks and almost sure quantifier elimination

Abstract

We introduce a formal logical language, called conditional probability logic (CPL), which extends first-order logic and which can express probabilities, conditional probabilities and which can compare conditional probabilities. Intuitively speaking, although formal details are different, CPL can express the same kind of statements as some languages which have been considered in the artificial intelligence community. We also consider a way of making precise the notion of lifted Bayesian network, where this notion is a type of (lifted) probabilistic graphical model used in machine learning, data mining and artificial intelligence. A lifted Bayesian network (in the sense defined here) determines, in a natural way, a probability distribution on the set of all structures (in the sense of first-order logic) with a common finite domain D . Our main result is that for every "noncritical" CPL-formula φ( x ¯ ) there is a quantifier-free formula φ ∗ ( x ¯ ) which is "almost surely" equivalent to φ( x ¯ ) as the cardinality of D tends towards infinity. This is relevant for the problem of making probabilistic inferences on large domains D , because (a) the problem of evaluating, by "brute force", the probability of φ( x ¯ ) being true for some sequence d ¯ of elements from D has, in general, (highly) exponential time complexity in the cardinality of D , and (b) the corresponding probability for the quantifier-free φ ∗ ( x ¯ ) depends only on the lifted Bayesian network and not on D . The main result has two corollaries, one of which is a convergence law (and zero-one law) for noncritial CPL-formulas.

Full PDF

CCONDITIONAL PROBABILITY LOGIC, LIFTED BAYESIANNETWORKS AND ALMOST SURE QUANTIFIER ELIMINATION

VERA KOPONEN

Abstract.

We introduce a formal logical language, called conditional probability logic(CPL) , which extends ﬁrst-order logic and which can express probabilities, conditionalprobabilities and which can compare conditional probabilities. Intuitively speaking,although formal details are diﬀerent, CPL can express the same kind of statements assome languages which have been considered in the artiﬁcial intelligence community.We also consider a way of making precise the notion of lifted Bayesian network , wherethis notion is a type of (lifted) probabilistic graphical model used in machine learn-ing, data mining and artiﬁcial intelligence. A lifted Bayesian network (in the sensedeﬁned here) determines, in a natural way, a probability distribution on the set ofall structures (in the sense of ﬁrst-order logic) with a common ﬁnite domain D . Ourmain result (Theorem 3.14) is that for every “noncritical” CPL-formula ϕ (¯ x ) thereis a quantiﬁer-free formula ϕ ∗ (¯ x ) which is “almost surely” equivalent to ϕ (¯ x ) as thecardinality of D tends towards inﬁnity. This is relevant for the problem of makingprobabilistic inferences on large domains D , because (a) the problem of evaluating, by“brute force”, the probability of ϕ (¯ x ) being true for some sequence ¯ d of elements from D has, in general, (highly) exponential time complexity in the cardinality of D , and(b) the corresponding probability for the quantiﬁer-free ϕ ∗ (¯ x ) depends only on thelifted Bayesian network and not on D . Some conclusions regarding the computationalcomplexity of ﬁnding ϕ ∗ are given in Remark 3.17. The main result has two corollaries,one of which is a convergence law (and zero-one law) for noncritial CPL-formulas. Introduction

We consider an extension of ﬁrst-order logic which we call conditional probability logic (Deﬁnition 3.1), abbreviated CPL, with which it is possible to express statements aboutprobabilities, conditional probabilities, and to compare conditional probabilities whichmakes it possible to express statements about the (conditional) independence (or de-pendence) of events or random variables. Remarks 3.4, 3.6 and Example 3.5 belowillustrate this. The semantics of CPL deals only with ﬁnite structures and assumes thatall elements in a structure are equally likely, so (conditional) probabilities correspond toproportions. Quite similar formal languages, which aim at expressing the same sort ofstatements, have been studied within the ﬁeld of artiﬁcial intelligence by Halpern [11,Section 2] and Bacchus et. al. [2, Deﬁnition 4.1]. CPL is more expressive than theprobability logic L ωP considered by Keisler and Lotfallah in [16] (which cannot express conditional probabilities) and our ﬁrst theorem (Theorem 3.14) is a generalization oftheir main result [16, Theorem 4.9], both in the sense that the language considered hereis more expressive and that we consider a wider range of probability distributions.A graphical model for a probability distribution and a set of random variables is a“graphical” way of describing the conditional dependencies and independencies betweenthe random variables. In such a probabilistic model the random variables are also viewedas the vertices of a directed or undirected graph where edges indicate conditional depen-dencies and independencies [3, 23]. The notion of a Bayesian network is one of the most Date : 3 April 2020. a r X i v : . [ m a t h . L O ] A p r VERA KOPONEN well-known graphical models. A

Bayesian network G for a probability space ( S, µ ) andrandom binary variables X , . . . , X n is determined by the following data:(1) A (not necessarilly connected) directed acyclic graph (DAG), also denoted G ,with vertex set V = { X , . . . , X n } such that if there is an arrow (directed arc)from X i to X j then i < j .(2) To each vertex X i ∈ V conditional probabilities are associated in such a way thatthe following holds:(a) For each j the set of parents of X j , denoted par ( X j ) , is minimal (withrespect to set inclusion) with the property that for every i < j , X i and X j are independent over par ( X j ) .(b) The joint probability distribution on X , . . . , X n is determined by the con-ditional probabilities associated with the vertices of G .If G is a Bayesian network as deﬁned above, then it follows (from e.g. [23, Deﬁnition 1.2.1and Theorems 1.2.6, 1.2.7]) that(i) For every X j ∈ V , X j and the set of all predecessors of X j are independent over par ( X j ) .(ii) For every X j ∈ V , X j and the set of all nondescendants of X j (except X j itself)are independent over par ( X j ) .Moreover: if condition (i) or condition (ii) holds, then { X , . . . , X n } can be ordered sothat conditions (a) and (b) above hold without changing the arrows of the DAG.Graphical models are used in machine learning, data mining and artiﬁcial intelligencein (probability based) learning and inference making. To illustrate this by a very simpleexample, suppose that we have a ﬁnite set A of some kind of objects and properties P, Q and R which objects in A may, or may not, have. We can view A as a “training set”. Thetraining set can be formalized as a σ -structure with domain A where σ = { P, Q, R } and P, Q and R are also viewed as unary relation symbols. Let µ be a probability distributionon A and let binary random variables X, Y, Z : A → { , } be deﬁned by X ( a ) = 1 if a has the property P and X ( a ) = 0 otherwise (for every a ∈ A ); Y ( a ) = 1 if a has theproperty Q and Y ( a ) = 0 otherwise; and analogously for Z and R . Suppose that, aftersome “learning”, we have found a Bayesian network G for ( A, µ ) and X, Y, Z such that itsDAG is as illustrated and the (conditional) probabilities µ ( X = 1) , µ ( Y = 1 | X = 1) , XY Z µ ( Y = 1 | X = 0) , µ ( Z = 1 | X = 1) and µ ( Z = 1 | X = 0) are speciﬁed. (In realapplications, it is unlikely that a relatively simple probabilistic model,which is desirablefor computational eﬃciency, ﬁts the training data completely and usually this is noteven the goal because one wants to avoid so-called “overﬁtting”; so one can view theBayesian network as a reasonable approximation of the training data.) An application ofthe Bayesian network G is to make predictions about probabilities on some other ﬁnitedomain B . Let us now make the following assumptions, partly based on G but where theindependency assumptions between diﬀerent objects are imposed. Every b ∈ B has theprobability µ ( X = 1) of having property P , independently of what the case is for other b (cid:48) ∈ B . For every b ∈ B , if b has the property P then the probability that b also has theproperty Q ( R ) is µ ( Y = 1 | X = 1) ( µ ( Z = 1 | X = 1) ), independently of what the caseis for other elements in B , and if b does not have the property P then the probabilitythat b has Q ( R ) is µ ( Y = 1 | X = 0) ( µ ( Z = 1 | X = 0) ), independently of whatthe case is for other elements. Based on this we can deﬁne a probability distribution onthe set W B of all σ -structures with domain B , where each member of W B representsa “possible scenario” or “possible world”. For every formula ϕ ( x , . . . , x k ) of conditional ONDITIONAL PROBABILITY LOGIC 3 probability logic and any choice of b , . . . , b k ∈ B we can now ask what the probabilityis that ϕ ( x , . . . , x k ) is satisﬁed by b , . . . , b k .When using a Bayesian network G for prediction as in the example we have “lifted”it from its original context (the set A ) and used it on a new domain of objects. Alsowhen moving from the ﬁxed domain A to an arbitrary domain B we have, in a sense,“lifted” our reasoning from propositional logic to ﬁrst-order logic, or some extension ofit. Perhaps this is the reason why the term “lifted graphical model” is used by someauthors when a graphical model is used to describe or predict (conditional) probabilitiesof events on an arbitrary or unknown domain; see [18] for a survey of lifted graphicalmodels. In the subﬁeld of machine learning, data mining and artiﬁcial intelligence called statistical relational learning (or sometimes probabilistic logic learning ) the “lifted” per-spective is central as one here considers general domains of objects and properties andrelations that may, or may not, hold for, or between, the objects. (See for example[6, 9].) There is no consensus regarding what, exactly, a lifted Bayesian network (letalone lifted graphical model) is or how it determines a probability distribution on a setof “possible worlds”. Diﬀerent approaches have been considered. A key question is howthe probability that a random variable takes a particular value is inﬂuenced by its par-ents in the DAG of the Bayesian network. The above example uses the most simple formof aggregation/combination rules . Another approach is to use aggregation/combinationfunctions . (Some explanation of these notions are found in e.g. [6, p. 31, 54], [18, p.18], [13].) From a practical point of view it probably makes sense to have the freedomto adapt one’s lifted graphical model to the application at hand, so uniformity may notbe a primary concern for practicians. But to prove mathematical theorems about liftedgraphical models, and the probability distributions that they induce, we need (of course)to make precise what we mean, which is done in Section 3.In this article we use aggregation rules expressed by formulas of conditional probabilitylogic (CPL). The idea is that for any relation symbol R , of arity k say, there are a aninteger ν R , numbers α R,i ∈ [0 , , and CPL-formulas χ R,i ( x , . . . , x k ) for i = 1 , . . . , ν R such that if χ R,i ( x , . . . , x k ) holds, then the probability that R ( x , . . . , x k ) holds is α R,i .This formalism is strong enough to express, for example, aggregation rules of the followingkind for arbitrary m , any CPL-formula ψ ( x , . . . , x k ) and any α i ∈ [0 , , i = 0 , . . . , m :For all i = 0 , . . . , m , if the proportion of k -tuples that satisfy ψ ( x , . . . , x k ) is in theinterval [ i/m, ( i + 1) /m ] , then the probability that R ( x , . . . , x k ) holds is α i .Once we have made precise (as in Deﬁnition 3.8) what we mean by a lifted Bayesiannetwork G for a ﬁnite relational signature σ (i.e. a ﬁnite set of relation symbols, possiblyof diﬀerent arities) and also made precise (as in Deﬁnition 3.11) how G determines aprobability distribution P D on the set of all σ -structures with domain D (for some ﬁniteset D ), then we can ask questions like this: Given a CPL-formula, ϕ ( x , . . . , x k ) and d , . . . , d k ∈ D what is the probability that ϕ ( x , . . . , x k ) is satisﬁed by the sequence d , . . . , d k ? Or more formally, what is P D (cid:0) {D ∈ W D : D | = ϕ ( d , . . . , d k ) } (cid:1) ? It iscomputationally very expensive to answer the question by analyzing all members of W D ,since, in general, the cardinality of W D is in the order of | D | r where r is the maximalarity of relation symbols in σ and | D | is the cardinality of D . However, our ﬁrst theorem(Theorem 3.14) says that if ϕ is “noncritical” in the sense that its conditional probabilityquantiﬁers (if any) avoids “talking about” certain ﬁnitely many critical numbers, thenthere is a quantiﬁer-free formula ϕ ∗ ( x , . . . , x k ) such that, with probability approaching1 as | D | → ∞ , ϕ and ϕ ∗ are equivalent. If we are given such ϕ ∗ then we can easilycompute the probability α ∗ = P D (cid:0) {D ∈ W D : D | = ϕ ∗ ( d , . . . , d k ) } (cid:1) by using onlythe lifted Bayesian network G , so in particular this computation is independent of thecardinality of D . Moreover, α ∗ only depends on the quantiﬁer-free formula ϕ ∗ and not VERA KOPONEN on the choice of elements d , . . . , d k . We also get that, as | D | → ∞ , P D (cid:0) {D ∈ W D : D | = ϕ ( d , . . . , d k ) } (cid:1) → α ∗ .But of course, given a noncritical ϕ , we have to ﬁrst ﬁnd a quantiﬁer-free ϕ ∗ whichis “almost surely” equivalent to ϕ . The proof of Theorem 3.14 produces an algorithmfor doing this. At one step in the algoritm one may need to transform a quantiﬁer-free formula into an equivalent disjunctive normal form and this computational taskis, in general, NP-hard. But if one assumes that all quantiﬁer-free subformulas of ϕ are disjunctive normal forms, then the algorithm that produces ϕ ∗ works in quadratictime in the length of ϕ if we assume that an arithmetic operation, a comparison of twonumbers and a comparison of two literals is completed in one time step (more details inRemark 3.17).The proof of Theorem 3.14 gives some by-products such as a “logical limit/convergencelaw” (Theorem 3.15) and a result (Theorem 3.16) saying that for every lifted Bayesiannetwork as in Deﬁnition 3.8 there is an “almost surely equivalent” lifted Bayesian networkin which all aggregation formulas (as in Deﬁnition 3.8) are quantiﬁer-free. The originalzero-one law for ﬁrst-order logic, independently of Glebskii et. al. [10] and Fagin [8],becomes a special case of Theorem 3.15 when we restrict attention to ﬁrst-order sen-tences and the DAG of the lifted Bayesian network has no edges and all the probabilitiesassociated to the vertices are / .A couple of earlier results exist which have similarity to the results of this article.Jaeger [13] has considered another sort of lifted Bayesian network which he calls rela-tional Bayesian network . Instead of using using aggregation/combination rules (as wedo in this article) relational Bayesian networks use aggregation/combination functions.Theorem 3.9 in [13] is as analogoue of Theorem 3.15 below for ﬁrst-order formulas inthe setting of relational Bayesian networks which use only “exponentially convergent”combination functions. Theorem 4.7 in [14] has a similar ﬂavour as Theorem 3.16 below,but [14] considers “admissible” relational Bayesian networks and a probability measuredeﬁned by such on the set of structures with a common inﬁnite countable domain.The results of this article are mainly motivated by concepts and methods in machinelearning, data mining and artiﬁcial intelligence, but if the results are seen from the per-spective of ﬁnite model theory and random discrete structures, then they join a longtradition of results concerning logical limit laws and almost sure elimination of quanti-ﬁers. For a very small and eclectic selection of work in this ﬁeld, ranging from the ﬁrstto some of the last, see for example [8, 10, 12, 15, 19, 21, 22, 24, 25].The organization of this article is as follows. Section 2 introduces the basic conventionsused in this article as well as some basic deﬁnitions. Section 3 deﬁnes the main notionsof the article and states the main results. Section 4 gives the proofs of these results.The last section is a brief discussion about further research in the topics of formal logic,probabilistic graphical models, almost sure elimination of quantiﬁers and convergencelaws. 2. Preliminaries

Basic knowledge of ﬁrst-order logic and ﬁrst-order structures is expected and there aremany sources in which the reader can ﬁnd this background, for example [20]. In thissection we clarify and deﬁne some basic notation and terminology concerning logic andgraph theory. Formulas of a formal logic will usually be denoted by ϕ , ψ , θ or χ , possiblywith sub- or superscripts. Logical variables will be denoted x, y, z, u, v, w possibly withsub- or superscripts. Finite sequences/tuples of variables are similarly denoted ¯ x, ¯ y, ¯ z ,etc. If a formula is denoted by ϕ (¯ x ) then it is, as usual, assumed that all free variablesof ϕ occur in the sequence ¯ x (but we do not insist that every variable in ¯ x occurs in theformula denoted by ϕ (¯ x ) ); moreover in this context we will assume that all variables in ¯ x ONDITIONAL PROBABILITY LOGIC 5 are diﬀerent although this is occasionally restated. In general, ﬁnite sequences/tuples ofelements are denoted by ¯ a, ¯ b, ¯ c , etc. For a sequence ¯ a , rng(¯ a ) denotes the set of elementsoccuring in ¯ a . For a sequence ¯ a , | ¯ a | denotes its length. For a set A , | A | denotes itscardinality. In particular, if ϕ is a formula of some formal logic (so ϕ is a sequence ofsymbols), then | ϕ | denotes its length. Sometimes we abuse notation by writing ‘ ¯ a ∈ A ’when we actually mean that rng(¯ a ) ⊆ A .By a signature (or vocabulary ) we mean a set of relation symbols, function symbolsand constant symbols. A signature σ is called ﬁnite a relational if it is ﬁnite as a setand all symbols in it are relation symbols. We use the terminology ‘ σ -structure ’, or just structure if we omit mentioning the signature, in the sense of ﬁrst-order logic. Structuresin this sense will be denoted by calligraphic letters A , B , C , etc. The domain (or universe)of a structure A will often be denoted by the corresponding non-calligraphic letter A .A structure is called ﬁnite if its domain is ﬁnite. If σ (cid:48) ⊂ σ are signatures and A is σ -structure, then A (cid:22) σ (cid:48) denotes the reduct of A to the signature σ (cid:48) . We let [ n ] denotethe set { , . . . , n } . We use the terminology atomic ( σ -)formula in the sense of ﬁrst-orderlogic with equality, so in particular, the expression ‘ x = y ’ is an atomic σ -formula forevery signature σ , including the empty signature σ = ∅ . It will also be convenient tohave a special symbol (cid:62) which is viewed as an atomic σ -formula for every signature σ ;the formula (cid:62) is interpreted as being true in every structure. Deﬁnition 2.1.

Let σ be a ﬁnite relational signature and ¯ x a sequence of diﬀerentvariables.(i) If ϕ (¯ x ) is an atomic σ -formula, then ϕ (¯ x ) and ¬ ϕ (¯ x ) are called σ -literals .(ii) A consistent set of σ -literals is called an atomic σ -type . When denoting an atomic σ -type by p (¯ x ) it is assumed (as for formulas) that if a variable occurs in a formulain p (¯ x ) , then it belongs to the sequence ¯ x .(iii) If p (¯ x ) is an atomic σ -type, then the identity fragment of p (¯ x ) is the set offormulas of the form x i = x j or x i (cid:54) = x j that belong to p (¯ x ) .(iv) If p (¯ x ) denotes an atomic σ -type and for every σ -literal ϕ (¯ x ) , either ϕ (¯ x ) ∈ p (¯ x ) or ¬ ϕ (¯ x ) ∈ p (¯ x ) , then p (¯ x ) is called a complete atomic σ -type (with respect to σ ) .An atomic σ -type which is not complete is sometimes called partial .(v) Let p (¯ x, ¯ y ) be an atomic σ -type. The ¯ y -dimension of p (¯ x, ¯ y ) , denoted dim ¯ y ( p (¯ x, ¯ y )) ,is the maximal d ∈ N such that there are a σ -structure A and ¯ a, ¯ b ∈ A such that A | = p (¯ a, ¯ b ) and (cid:12)(cid:12) rng(¯ b ) \ rng(¯ a ) (cid:12)(cid:12) ≥ d .(vi) Let σ (cid:48) ⊆ σ and let p be an atomic σ -type. Then p (cid:22) σ (cid:48) = { ϕ ∈ p : ϕ is a σ (cid:48) -formula } and p (cid:22) ¯ x = { ϕ ∈ p : all free variables of ϕ occur in ¯ x } . Remark 2.2.

Note that if p (¯ x ) is complete atomic σ -type where ¯ x = ( x , . . . , x m ) , thenthis implies that for all ≤ i, j ≤ m , either x i = x j or x i (cid:54) = x j belongs to p (¯ x ) . (Alsoobserve that if p (¯ x, ¯ y ) is a complete atomic σ -type and dim ¯ y ( p (¯ x, ¯ y )) = d , then for every σ -structure A and for all ¯ a, ¯ b such that A | = p (¯ a, ¯ b ) , we have (cid:12)(cid:12) rng(¯ b \ rng(¯ a ) (cid:12)(cid:12) = d . Notation 2.3.

Let σ be a signature, ¯ x a sequence of diﬀerent variables, A a σ -structurewith domain A and ¯ a ∈ A | ¯ x | .(i) If p (¯ x ) is an atomic σ -type, then the notation ‘ A | = p (¯ a ) ’ means that A | = ϕ (¯ a ) for every formula ϕ (¯ x ) ∈ p (¯ x ) , or in other words that ¯ a satisﬁes every formula in p (¯ x ) with respect to the structure A , or (to use model theoretic language) that ¯ a realizes p (¯ x ) with respect to the structure A .(ii) If ¯ y is a sequence of diﬀerent variables (such that no variable occurs in both ¯ x and ¯ y ) and q (¯ x, ¯ y ) is an atomic σ -type, then q (¯ a, A ) = { ¯ b ∈ A | ¯ y | : A | = q (¯ a, ¯ b ) } .By a directed graph we mean a pair ( V, E ) where V is a (vertex) set and E ⊆ V × V . A directed acyclic graph , abbreviated DAG , is a directed graph ( V, E ) such that ( v, v ) / ∈ E VERA KOPONEN for all v ∈ V and such that there do not exist distinct v , . . . , v k ∈ V for any k ≥ suchthat ( v i , v i +1 ) ∈ E for all i = 0 , . . . , k − and ( a k , a ) ∈ E . A directed path in a directedgraph ( V, E ) is a sequence of distinct vertices v , . . . , v k ∈ V such that ( v i , v i +1 ) for all i = 0 , . . . , k − ; the length of this path is the number of edges in it, in other words, thelength is k . Deﬁnition 2.4. (About directed acyclic graphs)

Suppose that G is a DAG withnonempty and ﬁnite vertex set V . Let a ∈ V .(i) A vertex b ∈ V is a parent of a if ( b, a ) is a directed edge of G . We let par ( a ) denote the set of parents of a .(ii) We deﬁne the maximal path rank of a , or just mp-rank of a , denoted mpr( a ) , tobe the length of the longest directed path having a as its ﬁrst vertex (i.e. thelength of the longest path a , a , . . . , a k where a = a and ( a i , a i +1 ) is a directededge for each i = 0 , . . . , k − ).(iii) The maximal path rank of G , or just mp-rank of G , denoted mpr( G ) is deﬁnedas mpr( G ) = max { mpr( a ) : a ∈ V } .Observe that if G is a DAG with vertex set V and mpr( G ) = r and G (cid:48) is the inducedsubgraph of G with vertex set V (cid:48) = { a ∈ V : mpr( a ) < r } , then, for every a ∈ V (cid:48) , themp-rank of a is the same no matter if we compute it with respect to G (cid:48) or with respectto G ; it follows that mpr( G (cid:48) ) = r − .We call a random variable binary if it can only take the value or . The followingis a direct consequence of [1, Corollary A.1.14] which in turn follows from the Chernoﬀbound [4]: Lemma 2.5.

Let Z be the sum om n independent binary random variables, each onewith probability p of having the value 1. For every ε > there is c ε > , depending onlyon ε , such that the probability that | Z − pn | > εpn is less than e − c ε pn . Conditional probability logic and lifted Bayesian networks

In this section we deﬁne the main concepts of this article and state the main results.

Deﬁnition 3.1. (Conditional probability logic)

Suppose that σ is a signature. Thenthe set of conditional probability formulas over σ , denoted CP L ( σ ) , is deﬁned inductivelyas follows:(1) Every atomic σ -formula belongs to CP L ( σ ) (where ‘atomic’ has the same mean-ing as in ﬁrst-order logic with equality).(2) If ϕ, ψ ∈ CP L ( σ ) then ( ¬ ϕ ) , ( ϕ ∧ ψ ) , ( ϕ ∨ ψ ) , ( ϕ → ψ ) , ( ϕ ↔ ψ ) , ( ∃ xϕ ) ∈ CP L ( σ ) where x is a variable. (As usual, in practice we do not necessarily write out allparanteses.) We consider ∀ xϕ to be an abbreviation of ¬∃ x ¬ ϕ .(3) If r ≥ is a real number, ϕ, ψ, θ, τ ∈ CP L ( σ ) and ¯ y is a sequence of distinctvariables, then (cid:16) r + (cid:107) ϕ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y (cid:17) ∈ CP L ( σ ) and (cid:16) (cid:107) ϕ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y + r (cid:17) ∈ CP L ( σ ) . In both these new formulas all variables of ϕ, ψ, θ and τ that appear in thesequence ¯ y become bound . So this construction can be seen as a sort of quantiﬁ-cation, which may become more clear by the provided semantics below.A formula ϕ ∈ CP L ( σ ) is called quantiﬁer-free if contains no quantiﬁer, that is, if it isconstructed from atomic formulas by using only connectives ¬ , ∧ , ∨ , → , ↔ . Deﬁnition 3.2. (Semantics)

ONDITIONAL PROBABILITY LOGIC 7 (1) The interpretations of ¬ , ∧ , ∨ , → , ↔ and ∃ are as in ﬁrst-order logic.(2) Suppose that A is a ﬁnite σ -structure and let ϕ (¯ x, ¯ y ) , ψ (¯ x, ¯ y ) , θ (¯ x, ¯ y ) , τ (¯ x, ¯ y ) ∈ CP L ( σ ) . Let ¯ a ∈ A | ¯ x | .(a) We deﬁne ϕ (¯ a, A ) = (cid:8) ¯ b ∈ A | ¯ y | : A | = ϕ (¯ a, ¯ b ) (cid:9) .(b) The expression A | = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) means that ψ (¯ a, A ) (cid:54) = ∅ , τ (¯ a, A ) (cid:54) = ∅ and r + (cid:12)(cid:12) ϕ (¯ a, A ) ∩ ψ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ψ (¯ a, A ) (cid:12)(cid:12) ≥ (cid:12)(cid:12) θ (¯ a, A ) ∩ θ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) τ (¯ a, A ) (cid:12)(cid:12) and in this case we say that (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) istrue (or holds) in A . If ψ (¯ a, A ) = ∅ or τ (¯ a, A ) = ∅ or r + (cid:12)(cid:12) ϕ (¯ a, A ) ∩ ψ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ψ (¯ a, A ) (cid:12)(cid:12) < (cid:12)(cid:12) θ (¯ a, A ) ∩ τ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) τ (¯ a, A ) (cid:12)(cid:12) then we write A (cid:54)| = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) and say that (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) is false in A .(c) The meaning of A | = (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y + r (cid:17) is deﬁned similarly. Remark 3.3. (A warning)

Observe that with the given semantics,

A (cid:54)| = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) does not necessarily imply A | = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≤ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) because the ﬁrst formula may fail to be true for ¯ a because ψ (¯ a, A ) = ∅ or τ (¯ a, A ) = ∅ in which case the corresponding fraction is undeﬁned and then also the other formula isfalse for ¯ a . Remark 3.4. (Expressing conditional probabilities, or just probabilities)

Let ¯ x = ( x , . . . , x k ) and ¯ y = ( y , . . . , y l ) . If τ (¯ x, ¯ y ) denotes the formula y = y and θ (¯ x, ¯ y ) denotes the formula y (cid:54) = y , then(3.1) (cid:16) (cid:107) ϕ (¯ x, ¯ y ) | ψ (¯ x, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ x, ¯ y ) | τ (¯ x, ¯ y ) (cid:107) ¯ y + r (cid:17) expresses that the proportion of tuples ¯ y that satisfy ϕ (¯ x, ¯ y ) among those ¯ y that sat-isfy ψ (¯ x, ¯ y ) is at least r . Thus the formula expresses a conditional probability if weassume that all l -tuples have the same probability. Under the stated assumptions, let usabbreviate (3.1) by(3.2) (cid:16) (cid:107) ϕ (¯ x, ¯ y ) | ψ (¯ x, ¯ y ) (cid:107) ¯ y ≥ r (cid:17) . If we assume, in addition, that ψ (¯ x, ¯ y ) is the formula y = y , then each of (3.1) and (3.2)expresses that the proportion of l -tuples ¯ y that satisfy ϕ (¯ x, ¯ y ) is at least r . VERA KOPONEN

Example 3.5.

Suppose that M is a unary relation symbol and F a binary relationsymbol. Consider the statement “For at least half of all persons x , if at least one thirdof the friends of x are mathematicians, then x is a mathematician”. If M ( x ) expressesthat “ x is a mathematician” and F ( x, y ) expresses that “ x and y are friends”, then thisstatement can be formulated in CPL, using the abbreviation (3.2), as (cid:16)(cid:13)(cid:13)(cid:0) (cid:107) M ( y ) | F ( x, y ) (cid:107) y ≥ / (cid:1) → M ( x ) (cid:12)(cid:12) x = x (cid:13)(cid:13) x ≥ / (cid:17) . Remark 3.6. (Expressing independence)

Suppose that A is a ﬁnite σ -structure, θ (¯ x, ¯ y ) is the formula y = y and ¯ a ∈ A | ¯ x | . If r = 0 and A | = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ϕ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) ∧ (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y + r (cid:17) , then the event X = { ¯ b ∈ A | ¯ y | : A | = ϕ (¯ a, ¯ b ) } is independent from the event Y = { ¯ b ∈ A | ¯ y | : A | = ψ (¯ a, ¯ b ) } if all | ¯ y | -tuples have the same probability.If A represents a database from the real world, then it is unlikely that events of interestare (conditionally) independent according the precise mathematical deﬁnition. Insteadone may look for “approximate (conditional) independencies”. If r is changed to be asmall positive number and if A | = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ϕ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) ∧ (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y + r (cid:17) ∧ (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | ϕ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) ∧ (cid:16) (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ (¯ a, ¯ y ) | ϕ (¯ a, ¯ y ) (cid:107) ¯ y + r (cid:17) then the dependency between X and Y is weak, or one could say that they are “approx-imately independent up to an error of r ”. The reason for the more complicated formulais to make “ r -approximate independence” symmetric. Deﬁnition 3.7.

The quantiﬁer rank , qr( ϕ ) , of formulas ϕ ∈ CP L ( σ ) is deﬁned induc-tively as follows:(1) For atomic ϕ , qr( ϕ ) = 0 .(2) qr( ¬ ϕ ) = qr( ϕ ) , qr( ϕ (cid:63) ψ ) = max { qr( ϕ ) , qr( ψ ) } if (cid:63) is one of ∧ , ∨ , → or ↔ .(3) qr( ∃ xϕ ) = qr( ϕ ) + 1 (4) qr (cid:16)(cid:0) r + (cid:107) ϕ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y (cid:1)(cid:17) = qr (cid:16)(cid:0) (cid:107) ϕ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y + r (cid:1)(cid:17) =max { qr( ϕ ) , qr( ψ ) , qr( θ ) , qr( τ ) } + | ¯ y | . Deﬁnition 3.8. (Lifted Bayesian network)

Let σ be a ﬁnite relational signature. Inthis article we deﬁne a lifted Bayesian network for σ to consist of the following compo-nents:(a) An acyclic directed graph (DAG) G with vertex set σ .(b) For each R ∈ σ , a number ν R ∈ N + , formulas χ R,i (¯ x ) ∈ CP L (par( R )) , for i = 1 , . . . , ν R , where | ¯ x | equals the arity of R , such that ∀ ¯ x (cid:0) (cid:87) ν R i =1 χ R,i (¯ x ) (cid:1) isvalid (i.e. true in all par( R ) -structures) and if i (cid:54) = j then ∃ ¯ x (cid:0) χ R,i (¯ x ) ∧ χ R,j (¯ x ) (cid:1) is unsatisﬁable. Each χ R,i will be called an aggregation formula (of G ) .(c) For each R ∈ σ and each ≤ i ≤ ν R , a number denoted µ ( R | χ R,i ) (or µ ( R (¯ x ) | χ R,i (¯ x )) ) in the interval [0 , .We will use the same symbol (for example G ) to denote a lifted Bayesian network andits underlying DAG. The intuitive meaning of µ ( R | χ R,i ) in part (c) is that if ¯ a is a ONDITIONAL PROBABILITY LOGIC 9 sequence of elements from a structure and ¯ a satisﬁes χ R,i (¯ x ) , then the probability that ¯ a satisﬁes R (¯ x ) is µ ( R | χ R,i ) . Remark 3.9. (Subnetworks)

Let G denote a lifted Bayesian network for σ . Supposethat σ (cid:48) ⊂ σ is such that if R ∈ σ (cid:48) then par( R ) ⊆ σ (cid:48) . Then it is easy to see that σ (cid:48) determines a lifted Bayesian network G (cid:48) for σ (cid:48) such that • the vertex set of the underlying DAG of G (cid:48) is σ (cid:48) , • for every R ∈ σ (cid:48) , the number ν R and the formulas χ R,i , i = 1 , . . . , ν R , are thesame as those for G , • for every R ∈ σ (cid:48) and every ≤ i ≤ ν R , the numbers µ ( R | χ R,i ) are the same asthose for G .We call the so deﬁned lifted Bayesian network G (cid:48) for σ (cid:48) the subnetwork (of G ) inducedby σ (cid:48) . Deﬁnition 3.10. (The case of an empty signature) (i) As a technical conveniencewe will also consider a lifted Bayesian network, denoted G ∅ , for the empty signature ∅ .According to Deﬁnition 3.8 the vertex set of the underlying DAG of G ∅ is ∅ , the emptyset. It follows that no formulas or numbers as in parts (b) and (c) of Deﬁnition 3.8 needto be speciﬁed for G ∅ .(ii) For every n ∈ N + , let W ∅ n denote the set of all ∅ -structures with domain [ n ] and notethat every W ∅ n has only one member which is just the set [ n ] .(iii) For every n ∈ N + , let P ∅ n be the unique probability distribution on W ∅ n . Deﬁnition 3.11. (The probability distribution in the general case)

Let σ be aﬁnite nonempty relational signature and let G denote a lifted Bayesian network for σ .Suppose that the underlying DAG of G has mp-rank ρ . For each ≤ r ≤ ρ let G r bethe subnetwork (in the sense of Remark 3.9) induced by σ r = { R ∈ σ : mp( R ) ≤ r } andnote that G ρ = G . Also let G − = G ∅ and let P − n be the unique probability distributionon W − n = W ∅ n . By induction on r we deﬁne, for every r = 0 , , . . . , ρ , a probabilitydistribution P rn on the set W rn of all σ r -structures with domain [ n ] as follows: For every A ∈ W rn , P rn ( A ) = P r − n ( A (cid:22) σ r − ) (cid:89) R ∈ σ r \ σ r − ν R (cid:89) i =1 (cid:89) ¯ a ∈ χ R,i ( A (cid:22) σ r − ) λ ( A , R, i, ¯ a ) where λ ( A , R, i, ¯ a ) =  µ ( R | χ R,i ) if A | = χ R,i (¯ a ) ∧ R (¯ a ) , − µ ( R | χ R,i ) if A | = χ R,i (¯ a ) ∧ ¬ R (¯ a ) , W n = W ρn and P n = P ρn , so P n is a probability distribution on the set ofall σ -structures with domain [ n ] . Remark 3.12. ((Ir)reﬂexive and/or symmetric relations)

Let A be a set and let R ⊆ A k be a k -ary relation on A . We call R reﬂexive if for all a ∈ A the k -tuple containing a in each coordinate belongs to R . We call R irreﬂexive if for every ( a , . . . , a k ) ∈ R we have a i (cid:54) = a j if i (cid:54) = j . We call R symmetric if for every ( a , . . . , a k ) ∈ R , everypermutation of ( a , . . . , a k ) also belongs to R . Consider Deﬁnition 3.11 and let R ∈ σ . Wecan make sure that P n ( A ) > only if the interpretation of R in A is reﬂexive (respectivelyirreﬂexive) by choosing the formulas χ R,i and associated (conditional) probabilities inan appropriate way. To achieve that P n ( A ) > only if the interpretation of R in A issymmetric we can do like this: In the deﬁnition of λ ( A , R, i, ¯ a ) (in Deﬁnition 3.11) weinterpret R (¯ a ) as meaning that R is satisﬁed by every permutation of ¯ a and we interpret ¬ R (¯ a ) as meaning that R is not satisﬁed by any permutation of ¯ a . We also need to assume that for every k -tuple ¯ a , either every permutation of ¯ a satisﬁes χ R,i (¯ x ) or nopermutation of ¯ a satisﬁes χ R,i (¯ x ) . Then the proof of Theorems 3.14 – 3.16 still worksout with very small modiﬁcations. Deﬁnition 3.13.

Let σ , W n and P n be as in Deﬁnition 3.11.(i) If ϕ (¯ x ) ∈ CP L ( σ ) and ¯ a ∈ [ n ] | ¯ x | , then we deﬁne P n ( ϕ (¯ a )) = P n (cid:0) {A ∈ W n : A | = ϕ (¯ a ) } (cid:1) .(ii) If ϕ ∈ CP L ( σ ) has no free variables (i.e. is a sentence), then we deﬁne P n ( ϕ ) = P n (cid:0) {A ∈ W n : A | = ϕ } (cid:1) .Now we can state the main results. They use the notion of noncritical formula whichdepends on the lifted Bayesian network under consideration. Since this notion is quitetechnical and relies on some technical results (concerning the convergence of the prob-ability that an atomic type is realized) which will be proved later, we give the precisedeﬁnition later in Deﬁnition 4.30; in that context it will be more evident why the deﬁni-tion of noncritical formula looks as it looks. For now I only say this: For every m ∈ N + there are ﬁnitely many numbers (depending only on G ) which are called m -critical (ac-cording to Deﬁnition 4.29). Roughly speaking, a formula ϕ (¯ x ) ∈ CP L ( σ ) is noncritical (details in Deﬁnition 4.30) if for every subformula (of ϕ (¯ x ) ) of the form (cid:16) r + (cid:107) χ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y (cid:17) or (cid:16) (cid:107) χ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y + r (cid:17) the number r is not the diﬀerence of two m -critical numbers where m = | ¯ x | + qr( ϕ ) . Itfollows that every ﬁrst-order formula is noncritical . Theorem 3.14. (Almost sure elimination of quantiﬁers for noncritical formu-las)

Let σ be a ﬁnite relational signature, let G be a lifted Bayesian network and, for each n ∈ N + , let P n be the probability distribution induced by G (according to Deﬁnition 3.11)on the set W n of all σ -structures with domain [ n ] . Suppose that every aggregate formula χ R,i of G is noncritical. If ϕ (¯ x ) ∈ CP L ( σ ) is noncritical, then there are a quantiﬁer freeformula ϕ ∗ (¯ x ) ∈ CP L ( σ ) and c > , which depend only on ϕ (¯ x ) and G , such that for allsuﬃciently large n P n (cid:0) ∀ ¯ x ( ϕ (¯ x ) ↔ ϕ ∗ (¯ x )) (cid:1) ≥ − e − cn . Theorem 3.15. (Convergence for noncritical formulas)

Let σ , G , W n and P n beas in Theorem 3.14. For every noncritical ϕ (¯ x ) ∈ CP L ( σ ) there are c > and ≤ d ≤ ,depending only on ϕ (¯ x ) and G , such that for every m ∈ N + and every ¯ a ∈ [ m ] | ¯ x | , (cid:12)(cid:12) P n ( ϕ (¯ a )) − d (cid:12)(cid:12) ≤ − e − cn for all suﬃciently large n ≥ m .Moreover, if ϕ has no free variable (i.e. is a sentence), then P n ( ϕ ) converges to either 0or 1. Theorem 3.16. (An asymptotically equivalent “quantiﬁer-free” network)

Let σ , G , W n and P n be as in Theorem 3.14. Then for every aggregate formula χ R,i (¯ x ) of G there is a quantiﬁer-free formula χ ∗ R,i (¯ x ) containing only relation symbols that occurin χ R,i such that if G ∗ is the lifted Bayesian network • with the same underlying DAG as G , • where, for every R ∈ σ and every ≤ i ≤ ν R , the aggregate formula χ R,i isreplaced by χ ∗ R,i , and • where µ ∗ ( R | χ ∗ R,i ) = µ ( R | χ R,i ) for every R ∈ σ and every ≤ i ≤ ν R ,then for every noncritical ϕ (¯ y ) ∈ CP L ( σ ) there is d > , depending only on ϕ (¯ y ) and G , such that for every m ∈ N + and every ¯ a ∈ [ m ] | ¯ y | , , (cid:12)(cid:12) P n ( ϕ (¯ a )) − P ∗ n ( ϕ (¯ a )) (cid:12)(cid:12) ≤ e − dn , for all suﬃciently large n ≥ m, ONDITIONAL PROBABILITY LOGIC 11 where P ∗ n is the the probability distribution on W n according to Deﬁnition 3.11 if G isreplaced by G ∗ and P n is replaced by P ∗ n . Remark 3.17. (Computational complexity)

The proof of Theorem 3.14 indicatesan algorithm for ﬁnding the quantiﬁer-free ϕ ∗ from ϕ . Suppose that we ﬁx the liftedBayesian network (so σ is also ﬁxed) and try to understand how eﬃcient the algorithmis with respect to the length of ϕ . The crucial step is Deﬁnition 4.35 and Lemma 4.37which together show how to eliminate a quantiﬁer of the form constructed in part (3)of Deﬁnition 3.1 in a satisﬁable formula. However, at this step in the proof we assumethat the formulas inside the latest quantiﬁcation are written as disjunctions of completeatomic types. The problem of transforming an arbitrary quantiﬁer-free formula intoan equivalent disjunctive normal form is NP-hard so the algorithm is not eﬃcient ingeneral (given the current state of aﬀairs in computational complexity theory). But ifwe assume that every quantiﬁer-free subformula of ϕ is a disjunctive normal form, thenthe number “steps” that the indicated algorithm needs to ﬁnd ϕ ∗ is O ( | ϕ | ) if | ϕ | denotesthe length of ϕ and “step” means an arithmetic operation , a comparison of two numbersor a comparison of two literals. This essentially follows from Remark 4.36 because thenumber of times that a quantiﬁer needs to be eliminated is bounded by | ϕ | . Remark 3.18. (Necessity of noncriticality)

It follows from Remark 3.4 that forevery sentence ψ of the language L ωP considered in [16] there is a sentence of CPL whichhas exactly the same ﬁnite models as ψ . Therefore it follows from [16, Proposition 3.1]that the assumption that ϕ is noncritical in Theorems 3.14 and 3.15 is necessary, even ifwe assume that σ contains one binary relation symbol and no other symbols. One mayask if it is also necessary in the above theorems that all aggregation formulas χ R,i arenoncritical. I do not currently know but I assume that the answer is yes.4.

Proof of Theorems 3.14, 3.15 and 3.16

Let σ be a ﬁnite relational signature and G a lifted Bayesian network for σ . The proofproceeds by induction on the mp-rank of the underlying DAG of G . The base case will not be when the mp-rank of G is 0. Instead the base case will be the “empty” liftedBayesian network for the empty signature ∅ , as described in Deﬁnition 3.10. In the caseof an empty signature (and consequently empty lifted Bayesian network) Theorems 3.14– 3.16 are a direct consequence of Lemma 4.13 below.The rest of the proof concerns the induction step. The induction step is provedby Proposition 4.41 and Corollary 4.42 which rely (only) on Assumption 4.1 belowwhich states the general assumptions related to the lifted Bayesian network and As-sumption 4.10 below which states the induction hypothesis. Theorems 3.14 – 3.16 followfrom the arguments in this section, in particular Proposition 4.41 and Corollary 4.42,because • k ∈ N + can be chosen arbitrarily large in Lemma 4.13 and in Assumption 4.10, • ε (cid:48) > can be chosen arbitrarily small in Lemma 4.13 and in Assumption 4.10,and • because we can choose δ (cid:48) ( n ) = e − dn for any d > in Lemma 4.13 and becauseof the lower bound in Lemma 4.28.For the rest of this section we assume the following: Assumption 4.1. (Relationship to a lifted Bayesian network) • σ is a ﬁnite relational signature and σ (cid:48) is a proper subset of σ . More precisely, adding, multiplying or dividing two numbers. • For each R ∈ σ \ σ (cid:48) , of arity m say, there are a number ν R ∈ N , a sequence ofvariables ¯ x = ( x , . . . , x m ) and formulas χ R,i (¯ x ) ∈ CP L ( σ (cid:48) ) , for i = 1 , . . . , ν R ,such that ∀ ¯ x (cid:0) (cid:87) ν R i =1 χ R,i (¯ x ) (cid:1) is valid (i.e. true in all σ (cid:48) -structures) and if i (cid:54) = j then ∃ ¯ x (cid:0) χ R,i (¯ x ) ∧ χ R,j (¯ x ) (cid:1) is unsatisﬁable. • For every R ∈ σ \ σ (cid:48) and every ≤ i ≤ ν R , µ ( R | χ R,i ) denotes a real number inthe interval [0 , . (Sometimes we write µ ( R (¯ x | χ R,i (¯ x )) where ¯ x is a sequenceof variables the length of which equals the arity of R .) • For every σ -structure A , every R ∈ σ \ σ (cid:48) , every ≤ i ≤ ν R and every ¯ a ∈ A r where r is the arity of R , let λ ( A , R, i, ¯ a ) =  µ ( R | χ R,i ) if A | = χ R,i (¯ a ) ∧ R (¯ a ) , − µ ( R | χ R,i ) if A | = χ R,i (¯ a ) ∧ ¬ R (¯ a ) , • For every n ∈ N + , W (cid:48) n is the set of all σ (cid:48) -structures with domain [ n ] = { , . . . , n } and P (cid:48) n is a probability distribution on W (cid:48) n . • For every n ∈ N + , W n is the set of all σ -structures with domain [ n ] .Recall that, according to Deﬁnition 3.2, if ψ (¯ x ) ∈ CP L ( σ (cid:48) ) and A ∈ W n then ψ ( A (cid:22) σ (cid:48) ) = { ¯ b : A (cid:22) σ (cid:48) | = ψ (¯ b ) } . Deﬁnition 4.2.

For every n ∈ N and every A ∈ W n we deﬁne P n ( A ) = P (cid:48) n ( A (cid:22) σ (cid:48) ) (cid:89) R ∈ σ \ σ (cid:48) ν R (cid:89) i =1 (cid:89) ¯ a ∈ χ R,i ( A (cid:22) σ (cid:48) ) λ ( A , R, i, ¯ a ) . Then P n is a probability distribution on W n which we may call the P (cid:48) n -conditionalprobability distribution on W n . Notation 4.3.

The notation in this section will follow the following pattern: σ (cid:48) -structures,in particular members of W (cid:48) n , will be denoted A (cid:48) , B (cid:48) , etcetera; subsets of W (cid:48) n will be de-noted X (cid:48) (or X (cid:48) n ), Y (cid:48) (or Y (cid:48) n ), etcetera; σ -structures and subsets of W n will be denotedsimilarly but without the (symbol for) “prime”.In the proofs that follow we will consider “restrictions” of P n to some subsets of W n according to the next deﬁnition. Deﬁnition 4.4. (i) If Y (cid:48) ⊆ W (cid:48) n then we deﬁne W Y (cid:48) = {A ∈ W n : A (cid:22) σ (cid:48) ∈ Y (cid:48) } and P Y (cid:48) ( A ) = P (cid:48) n ( A (cid:22) σ (cid:48) ) P (cid:48) n ( Y (cid:48) ) (cid:89) R ∈ σ \ σ (cid:48) ν R (cid:89) i =1 (cid:89) ¯ a ∈ χ R,i ( A ) λ ( A , R, i, ¯ a ) . (ii) If A (cid:48) ∈ W (cid:48) n , then we let W A (cid:48) = W {A (cid:48) } and, for every A ∈ W A (cid:48) , P A (cid:48) ( A ) = P {A (cid:48) } ( A ) = (cid:89) R ∈ σ \ σ (cid:48) ν R (cid:89) i =1 (cid:89) ¯ a ∈ χ R,i ( A ) λ ( A , R, i, ¯ a ) . Then P Y (cid:48) and P A (cid:48) are probability distributions on W Y (cid:48) and W A (cid:48) , respectively; if thisis not clear see Remark 4.7 below. Note also that if Y (cid:48) ⊆ W (cid:48) n , A (cid:48) ∈ Y (cid:48) and A ∈ W A (cid:48) ,then(4.1) P Y (cid:48) ( A ) = P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) ) P A (cid:48) ( A ) , ONDITIONAL PROBABILITY LOGIC 13 and in particular, taking Y (cid:48) = W (cid:48) n , we have, for every A ∈ W n ,(4.2) P n ( A ) = P (cid:48) n ( A (cid:22) σ (cid:48) ) P A (cid:22) σ (cid:48) ( A ) . We now state a few basic lemmas which will be useful.

Lemma 4.5.

For every n , if Y (cid:48) ⊆ W (cid:48) n then P n ( W Y (cid:48) ) = P (cid:48) n ( Y (cid:48) ) . Proof.

By using (4.2) in the ﬁrst line below we get P n ( W Y (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) (cid:88) A∈ W A(cid:48) P n ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) (cid:88) A∈ W A(cid:48) P (cid:48) n ( A (cid:48) ) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) P (cid:48) n ( A (cid:48) ) (cid:88) A∈ W A(cid:48) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) P (cid:48) n ( A (cid:48) ) = P (cid:48) n ( Y (cid:48) ) . (cid:3) Lemma 4.6.

For every n ,(i) if X ⊆ W n and A (cid:48) ∈ W (cid:48) n , then P n ( X | W A (cid:48) ) = P A (cid:48) ( X ∩ W A (cid:48) ) , and(ii) if X ⊆ W n and Y (cid:48) ⊆ W (cid:48) n , then P n ( X | W Y (cid:48) ) = P Y (cid:48) ( X ∩ W Y (cid:48) ) . Proof.

Let X ⊆ W n .(i) Let A (cid:48) ∈ W (cid:48) n . Using Lemma 4.5 in the ﬁrst line below and (4.2)) in the secondline below, we get P n ( X | W A (cid:48) ) = P n ( X ∩ W A (cid:48) ) P n ( W A (cid:48) ) = P n ( X ∩ W A (cid:48) ) P (cid:48) n ( A (cid:48) ) = P (cid:48) n ( A (cid:48) ) (cid:80) A∈ X ∩ W A(cid:48) P A (cid:48) ( A ) P (cid:48) n ( A (cid:48) ) = P A (cid:48) ( X ∩ W A (cid:48) ) . (ii) Let Y (cid:48) ⊆ W (cid:48) n . Using that X ∩ W Y (cid:48) is the disjoint union of all X ∩ W A (cid:48) such that A (cid:48) ∈ Y (cid:48) , Lemma 4.5, part (i) of this lemma and (4.1), we get P n ( X | W Y (cid:48) ) = P n ( X ∩ W Y (cid:48) ) P n ( W Y (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) P n ( X ∩ W A (cid:48) ) P n ( W Y (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) P n ( W A (cid:48) ) P n ( W Y (cid:48) ) P n ( X | W A (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) ) P A (cid:48) ( X ∩ W A (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) P Y (cid:48) ( X ∩ W A (cid:48) ) = P Y (cid:48) ( X ∩ W Y (cid:48) ) . (cid:3) Remark 4.7. ( About P A (cid:48) ) Fix any n and any A (cid:48) ∈ W (cid:48) n . For every R ∈ σ \ σ (cid:48) ,every ≤ i ≤ ν R and every ¯ a ∈ χ R,i ( A (cid:48) ) , let Ω( R, i, ¯ a ) = { , } and let P R,i, ¯ a be theprobability distribution on Ω( R, i, ¯ a ) with P R,i, ¯ a (1) = µ ( R | χ R,i ) . Then let P Ω be theproduct measure on Ω = (cid:89) R ∈ σ \ σ (cid:48) ≤ i ≤ ν R ¯ a ∈ χ R,i ( A (cid:48) ) Ω( R, i, ¯ a ) . Consider the map which sends

A ∈ W A (cid:48) to the ﬁnite sequence ¯ κ A = (cid:0) κ ( R, i, ¯ a ) : R ∈ σ \ σ (cid:48) , ≤ i ≤ ν R , ¯ a ∈ χ R,i ( A (cid:48) ) (cid:1) where κ ( R, i, ¯ a ) = 1 if A | = R (¯ a ) and κ ( R, i, ¯ a ) = 0 otherwise. This map is clearly abijection from W A (cid:48) to Ω and, for every A ∈ W A (cid:48) , P A (cid:48) ( A ) = P Ω (¯ κ A ) . For every α ∈ { , } , every R ∈ σ \ σ (cid:48) and every ¯ a ∈ [ n ] (having the same length as thearity of R ), let E αR, ¯ a = {A ∈ W A (cid:48) : A | = R α (¯ a ) } . From the connection to the productmeasure it follows that(a) for every R ∈ σ \ σ (cid:48) , every ≤ i ≤ ν R and every ¯ a ∈ χ R,i ( A (cid:48) ) , P A (cid:48) ( E R, ¯ a ) = µ (cid:0) R | χ R,i (cid:1) , and(b) if α , . . . , α m ∈ { , } , R , . . . , R m ∈ σ \ σ (cid:48) and ¯ a , . . . , ¯ a m are tuples where | ¯ a i | is the arity of R i for each i , and for all ≤ i < j ≤ m , R i (cid:54) = R j or ¯ a i (cid:54) = ¯ a j , thenthe events E α R , ¯ a , . . . , E α m R m , ¯ a m are independent.The next lemma is a direct consequence of (b) of Remark 4.7. Lemma 4.8.

Suppose that p ( x , . . . , x m ) and q ( x , . . . , x m ) are (possibly partial) atomic ( σ \ σ (cid:48) ) -types. Also assume that if ϕ is an atomic σ -formula which does not have the form x = x or the form (cid:62) and ϕ ∈ p or ¬ ϕ ∈ p , then neither ϕ nor ¬ ϕ belongs to q . Then,for every n , every A (cid:48) ∈ W (cid:48) n and all distinct a , . . . , a m ∈ [ n ] , the event {A ∈ W A (cid:48) : A | = p ( a , . . . , a m ) } is independent from the event {A ∈ W A (cid:48) : A | = q ( a , . . . , a m ) } inthe probability space ( W A (cid:48) n , P A (cid:48) ) . Deﬁnition 4.9. (Saturation and unsaturation)

Let ¯ x and ¯ y be sequences of diﬀerentvariables such that rng(¯ x ) ∩ rng(¯ y ) = ∅ and let p (¯ x, ¯ y ) and q (¯ x ) be atomic σ -types suchthat q ⊆ p . Let also ≤ α ≤ and d = dim ¯ y ( p ) .(a) A ﬁnite σ -structure A is called ( p, q, α ) -saturated if, whenever ¯ a ∈ A | ¯ x | and A | = q (¯ a ) , then (cid:12)(cid:12) { ¯ b ∈ A | ¯ y | : A | = p (¯ a, ¯ b ) } (cid:12)(cid:12) ≥ α | A | d .(b) A ﬁnite σ -structure A is called ( p, q, α ) -unsaturated if, whenever ¯ a ∈ A | ¯ x | and A | = q (¯ a ) , then (cid:12)(cid:12) { ¯ b ∈ A | ¯ y | : A | = p (¯ a, ¯ b ) } (cid:12)(cid:12) ≤ α | A | d .If p (cid:48) (¯ x, ¯ y ) and q (cid:48) (¯ x ) are atomic σ (cid:48) -types and q (cid:48) ⊆ p (cid:48) , then the notions of ( p (cid:48) , q (cid:48) , α ) -saturated and ( p (cid:48) , q (cid:48) , α ) -unsaturated are deﬁned in the same way, but considering ﬁnite σ (cid:48) -structures instead. Assumption 4.10. (Induction hypothesis)

Suppose that k ∈ N + , ε (cid:48) > , δ (cid:48) : N + → R ≥ and Y (cid:48) n ⊆ W (cid:48) n , for n ∈ N + , are such that the following hold:(1) lim n →∞ δ (cid:48) ( n ) = 0 .(2) P (cid:48) n ( Y (cid:48) n ) ≥ − δ (cid:48) ( n ) for all suﬃciently large n .(3) For every complete atomic σ (cid:48) -type p (cid:48) (¯ x ) with | ¯ x | ≤ k there is a number whichwe denote P (cid:48) ( p (cid:48) (¯ x )) , or just P (cid:48) ( p (cid:48) ) , such that for all suﬃciently large n and all ¯ a ∈ [ n ] which realize the identity fragment of p (cid:48) , (cid:12)(cid:12) P (cid:48) n (cid:0) {A (cid:48) ∈ W (cid:48) n : A (cid:48) | = p (cid:48) (¯ a ) } (cid:1) − P (cid:48) ( p (cid:48) (¯ x )) (cid:12)(cid:12) ≤ δ (cid:48) ( n ) . (4) For every complete atomic σ (cid:48) -type p (cid:48) (¯ x, ¯ y ) with | ¯ x ¯ y | ≤ k and < dim ¯ y ( p (cid:48) (¯ x, y )) = | ¯ y | , if q (cid:48) (¯ x ) = p (cid:48) (cid:22) ¯ x and P (cid:48) ( q (cid:48) ) > , then for all suﬃciently large n , every A (cid:48) ∈ Y (cid:48) n is ( p (cid:48) , q (cid:48) , α/ (1 + ε (cid:48) )) -saturated and ( p (cid:48) , q (cid:48) , α (1 + ε (cid:48) )) -unsaturated if α = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) / P (cid:48) ( q (cid:48) (¯ x )) .(5) For every χ R,i (¯ x ) as in Assumption 4.1 there is a quantiﬁer-free σ (cid:48) -formula χ ∗ R,i (¯ x ) such that for all suﬃciently large n and all A (cid:48) ∈ Y (cid:48) n , A (cid:48) | = ∀ ¯ x (cid:0) χ R,i (¯ x ) ↔ χ ∗ R,i (¯ x ) (cid:1) . Remark 4.11. (Some special cases) (i) As a technical convenience we allow emptytypes (and this does not contradict our deﬁnition of an atomic type). For example, inDeﬁnition 4.9, we allow the possibility that ¯ x is an empty sequence and consequently q (¯ x ) = ∅ and p (¯ x, ¯ y ) is really just p (¯ y ) .(ii) For an empty atomic σ (cid:48) -type p (cid:48) we let P (cid:48) ( p (cid:48) ) = 1 and in this case we also interpretthe set {A (cid:48) ∈ W (cid:48) n : A (cid:48) | = p (cid:48) (¯ a ) } as being equal to W (cid:48) n . Then part (3) of Assumption 4.10makes sense also for a empty type p (cid:48) . ONDITIONAL PROBABILITY LOGIC 15 (iii) If p (cid:48) (¯ y ) is a complete atomic σ (cid:48) -type P (cid:48) ( p (cid:48) ) = 0 , then for all suﬃciently large n andall A (cid:48) ∈ Y (cid:48) , p (cid:48) is not realized in A (cid:48) (i.e. p (cid:48) ( A (cid:48) ) = ∅ ). The reason is this: Let ¯ x denote anemtpy sequence and let q (cid:48) (¯ x ) be the empty atomic σ (cid:48) -type, so q ⊆ p . For large enough n , every A (cid:48) ∈ W (cid:48) n is ( p (cid:48) , q (cid:48) , P (cid:48) ( p (cid:48) )(1 + ε (cid:48) )) -unsaturated by part (4) of Assumption 4.10.If P (cid:48) ( p (cid:48) ) = 0 this implies that p (cid:48) has no realization in A . Lemma 4.12.

Suppose that p (cid:48) (¯ x ) is a complete atomic σ (cid:48) -type and that p (¯ x ) ⊇ p (cid:48) (¯ x ) isa (possibly partial) atomic σ -type. There is a number which we denote P ( p (¯ x ) | p (cid:48) (¯ x )) ,or just P ( p | p (cid:48) ) , such that for all suﬃciently large n , all ¯ a ∈ [ n ] and all A (cid:48) ∈ Y (cid:48) n suchthat A (cid:48) | = p (cid:48) (¯ a ) , P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p (¯ a ) } (cid:1) = P ( p (¯ x | p (cid:48) (¯ x )) . Moreover, the number P ( p (¯ x ) | p (cid:48) (¯ x )) is a product of numbers of the form µ ( R | χ R,i ) or − µ ( R | χ R,i ) . Proof.

Suppose that ¯ a, ¯ b ∈ [ n ] and A (cid:48) , B (cid:48) ∈ Y (cid:48) n are such that A (cid:48) | = p (cid:48) (¯ a ) and B (cid:48) | = p (cid:48) (¯ b ) .Let R ∈ σ \ σ (cid:48) . By part (5) of Assumption 4.10, for each ≤ i ≤ ν R , there is aquantiﬁer free formula χ ∗ R,i such that (if n is large enough) χ R,i is equivalent to χ ∗ R,i in every structure in Y (cid:48) n . It follows that if ¯ c (cid:48) and ¯ d (cid:48) are subsequences of ¯ a and ¯ b ,respectively, of length equal to the arity of R , then either A (cid:48) | = χ R,i (¯ c ) and B (cid:48) | = χ R,i ( ¯ d ) ,or A (cid:48) (cid:54)| = χ R,i (¯ c ) and B (cid:48) (cid:54)| = χ R,i ( ¯ d ) . The conclusion of the lemma now follows from (a)and (b) of Remark 4.7. (cid:3) Lemma 4.13. (The base case)

For every k ∈ N + and every ε (cid:48) > , if σ (cid:48) = ∅ , P (cid:48) n is the uniform probability distribution on W (cid:48) n for all n and δ (cid:48) : N + → R ≥ is anyfunction such that lim n →∞ δ (cid:48) ( n ) = 0 , then there are Y (cid:48) n ⊆ W (cid:48) n , for n ∈ N + , such that(1)–(4) in Assumption 4.10 hold. Moreover, for every ε (cid:48) -noncritical ϕ (¯ x ) ∈ CP L ( ∅ ) with | ¯ x | + qr( ϕ ) ≤ k there is a quantiﬁer-free formula ϕ ∗ (¯ x ) such that for all suﬃcientlylarge n and all A (cid:48) ∈ Y (cid:48) n , A (cid:48) | = ∀ ¯ x (cid:0) ϕ (¯ x ) ↔ ϕ ∗ (¯ x ) (cid:1) . Proof.

Suppose that σ (cid:48) = ∅ and let k ∈ N + and ε (cid:48) > be given. Then, for every n , W (cid:48) n contains a unique structure which is just the set [ n ] which has probability 1. Let δ (cid:48) : N + → R ≥ be any function such that lim n →∞ δ (cid:48) ( n ) = 0 . For every complete atomic σ (cid:48) -type p (cid:48) (¯ x ) let P (cid:48) ( p (cid:48) (¯ x )) = 1 . Observe that, for every n , if ¯ a ∈ [ n ] and ¯ a realizes theidentity fragment of p (cid:48) (¯ x ) , then ¯ a realizes p (cid:48) (¯ x ) in the unique A (cid:48) of W (cid:48) n . Hence, fortrivial reasons we have (3).For every n let Y (cid:48) n be the set of all A (cid:48) ∈ W (cid:48) n such that for every complete atomic σ (cid:48) -type p (cid:48) (¯ x, ¯ y ) with | ¯ x ¯ y | ≤ k and < dim ¯ y ( p (cid:48) (¯ x, y )) = | ¯ y | , if q (¯ x ) = p (cid:22) ¯ x , then for allsuﬃciently large n , every A (cid:48) ∈ Y (cid:48) n is ( p (cid:48) , q (cid:48) , / (1 + ε (cid:48) )) -saturated and ( p (cid:48) , q (cid:48) , (1 + ε (cid:48) )) -unsaturated. Suppose that p (cid:48) (¯ x, ¯ y ) is a complete atomic σ (cid:48) -type with | ¯ x ¯ y | ≤ k and < dim ¯ y ( p (cid:48) (¯ x, y )) = | ¯ y | . Let q (cid:48) (¯ x ) = p (cid:48) (¯ x, ¯ y ) and suppose that A (cid:48) | = q (cid:48) (¯ a ) where A (cid:48) ∈ W (cid:48) n . Then A (cid:48) | = p (cid:48) (¯ a, ¯ b ) for every ¯ b ∈ [ n ] consisting of diﬀerent elements noone of which occurs in ¯ a . There are n | ¯ y | − Cn | ¯ y |− such ¯ b for some constant C . So if n | ¯ y | − Cn | ¯ y |− ≥ n | ¯ y | ε (cid:48) then A (cid:48) is ( p (cid:48) , q (cid:48) , / (1 + ε (cid:48) )) -saturated. For trivial reasons, A (cid:48) is also ( p (cid:48) , q (cid:48) , (1 + ε (cid:48) )) -unsaturated. Hence, we have proved (4). The last claim of the lemmafollows from Proposition 4.32 the proof of which works out in exactly the same way if σ and Y n (in that proof) is replaced by σ (cid:48) and Y (cid:48) n , respectively, and we assume (4). Inother words, the almost everywhere elimination of quantiﬁers follows from the saturationand unsaturation properties stated in (4). (cid:3) In fact the uniform probability distribution is the only probability distribution on W (cid:48) n since W (cid:48) n isa singleton if σ (cid:48) = ∅ (which we assume in this lemma). In the sense of Deﬁnition 4.30.

Lemma 4.14.

Suppose that X n ⊆ W n . Then for all suﬃciently large n , P n ( X n ) ≤ P n ( X n ∩ W Y (cid:48) n ) + δ (cid:48) ( n ) . Proof.

We have P n ( X n ) = P n ( X n ∩ W Y (cid:48) n ) + P n ( X n \ W Y (cid:48) n ) and, using Lemma 4.5, we have P n ( X n \ W A (cid:48) ) ≤ P n ( W n \ W Y (cid:48) n ) = 1 − P n ( W Y (cid:48) n ) = 1 − P (cid:48) n ( Y (cid:48) n ) ≤ δ (cid:48) ( n ) . Hence P n ( X n ) ≤ P n ( X n ∩ W Y (cid:48) n ) + δ (cid:48) ( n ) . (cid:3) Lemma 4.15.

Suppose that p (cid:48) (¯ x ) is a complete atomic σ (cid:48) -type and that p (¯ x ) ⊇ p (cid:48) (¯ x ) isan (possibly partial) atomic σ -type. Letting n be suﬃciently large, then for all ¯ a ∈ [ n ] and letting Z (cid:48) n be the set of all A (cid:48) ∈ Y (cid:48) n such that A (cid:48) | = p (cid:48) (¯ a ) we have P n (cid:0) {A ∈ W n : A | = p (¯ a ) } | W Y (cid:48) n ∩ {A ∈ W n : A | = p (cid:48) (¯ a ) } (cid:1) = P Z (cid:48) n (cid:0) {A ∈ W Z (cid:48) n : A | = p (¯ a ) } (cid:1) = P ( p (¯ x ) | p (cid:48) (¯ x )) where P ( p (¯ x ) | p (cid:48) (¯ x )) is like in Lemma 4.12. Proof.

For every

A ∈ W n we have A | = p (cid:48) (¯ a ) if and only if A (cid:22) σ (cid:48) | = p (cid:48) (¯ a ) . Therefore W Y (cid:48) n ∩ {A ∈ W n : A | = p (cid:48) (¯ a ) } = W Z (cid:48) n . By Lemma 4.6 we have P n (cid:0) {A ∈ W n : A | = p (¯ a ) } | W Y (cid:48) n ∩ {A ∈ W n : A | = p (cid:48) (¯ a ) } (cid:1) = P Z (cid:48) n (cid:0) {A ∈ W Z (cid:48) n : A | = p (¯ a ) } (cid:1) . Then, using (4.1) and Lemma 4.12, we get P Z (cid:48) n (cid:0) {A ∈ W Z (cid:48) n : A | = p (¯ a ) } (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P Z (cid:48) n (cid:0) A ∈ W A (cid:48) : A | = p (¯ a ) (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Z (cid:48) n ) P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p (¯ a ) } (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Z (cid:48) n ) P ( p (¯ x | p (cid:48) (¯ x )) = P ( p (¯ x | p (cid:48) (¯ x )) . (cid:3) Lemma 4.16.

Suppose that p (cid:48) (¯ x ) is a complete atomic σ (cid:48) -type and that p (¯ x ) ⊇ p (cid:48) (¯ x ) isa (possibly partial) atomic σ -type. Then for all suﬃciently large n and all ¯ a ∈ [ n ] whichrealize the identity fragment of p (cid:48) (¯ x ) (and hence of p ) we have (cid:12)(cid:12) P n (cid:0) {A ∈ W n : A | = p (¯ a ) } | W Y (cid:48) n (cid:1) − P ( p (¯ x | p (cid:48) (¯ x )) · P (cid:48) ( p (cid:48) (¯ x )) (cid:12)(cid:12) < δ (cid:48) ( n ) . Proof.

Let ¯ a ∈ [ n ] realize the identity fragment of p (cid:48) (¯ x ) . Furthermore,let X n be the set of all A ∈ W n such that A | = p (¯ a ) ,let X (cid:48) n be the set of all A (cid:48) ∈ W (cid:48) n such that A (cid:48) | = p (cid:48) (¯ a ) , andlet Z (cid:48) n be the set of all A (cid:48) ∈ Y (cid:48) n such that A (cid:48) | = p (cid:48) (¯ a ) .From parts (2) and (3) of Assumption 4.10 it easily follows that (for large enough n ) P (cid:48) n ( Z (cid:48) n ) / P (cid:48) n ( Y (cid:48) n ) diﬀers from P (cid:48) n ( Z (cid:48) n ) by at most δ (cid:48) ( n ) , P (cid:48) n ( Z (cid:48) n ) diﬀers from P (cid:48) n ( X (cid:48) n ) by at most δ (cid:48) ( n ) and P (cid:48) n ( X (cid:48) n ) diﬀers from P (cid:48) ( p (cid:48) (¯ x )) by at most δ (cid:48) ( n ) . ONDITIONAL PROBABILITY LOGIC 17

By Lemma 4.6, P n ( X n | W Y (cid:48) n ) = P Y (cid:48) n ( X ∩ W Y (cid:48) n ) . Then, using (4.1) and Lemma 4.12,we have P Y (cid:48) n (cid:0) X ∩ W Y (cid:48) n (cid:1) = (cid:88) A (cid:48) ∈ Y (cid:48) n P Y (cid:48) n (cid:0) X n ∩ W A (cid:48) (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P Y (cid:48) n (cid:0) X n ∩ W A (cid:48) (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n (cid:88) A∈ X n ∩ W A(cid:48) P Y (cid:48) n ( A ) = (cid:88) A (cid:48) ∈ Z (cid:48) n (cid:88) A∈ X n ∩ W A(cid:48) P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) (cid:88) A∈ X n ∩ W A(cid:48) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P A (cid:48) (cid:0) X n ∩ W A (cid:48) (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P ( p (¯ x ) | p (cid:48) (¯ x )) = P ( p (¯ x ) | p (cid:48) (¯ x )) (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) = P ( p (¯ x ) | p (cid:48) (¯ x )) P (cid:48) n ( Z (cid:48) n ) P (cid:48) n ( Y (cid:48) n ) , where P (cid:48) ( p (cid:48) (¯ x )) − δ (cid:48) ( n ) ≤ P (cid:48) n ( Z (cid:48) n ) P (cid:48) n ( Y (cid:48) n ) ≤ P (cid:48) ( p (cid:48) (¯ x )) + 3 δ (cid:48) ( n ) . (cid:3) Lemma 4.17.

Suppose that p (cid:48) (¯ x ) is a complete atomic σ (cid:48) -type and that p (¯ x ) ⊇ p (cid:48) (¯ x ) isan (possibly partial) atomic σ -type. Then for all suﬃciently large n and all ¯ a ∈ [ n ] whichrealize the identity fragment of p (cid:48) (¯ x ) we have (cid:12)(cid:12) P n (cid:0) {A ∈ W n : A | = p (¯ a ) } (cid:1) − P ( p (¯ x | p (cid:48) (¯ x )) · P (cid:48) ( p (cid:48) (¯ x )) (cid:12)(cid:12) < δ (cid:48) ( n ) . Proof.

Let ¯ a ∈ [ n ] realize the identity fragment of p (cid:48) (¯ x ) . Let X n be the set of all A ∈ W n such that A | = p (¯ a ) . We have P n (cid:0) X n (cid:1) = P n (cid:0) X n | W Y (cid:48) n (cid:1) P n (cid:0) W Y (cid:48) n (cid:1) + P n (cid:0) X | W n \ W Y (cid:48) n (cid:1) P n (cid:0) W n \ W Y (cid:48) n (cid:1) . By the use of Lemma 4.5 and by part (2) of Assumption 4.10, we also have P n (cid:0) W n \ W Y (cid:48) n (cid:1) = 1 − P n (cid:0) W Y (cid:48) n (cid:1) = 1 − P (cid:48) n ( Y (cid:48) n ) ≤ δ (cid:48) ( n ) . It follows that P n (cid:0) X | W n \ W Y (cid:48) n (cid:1) P n (cid:0) W n \ W Y (cid:48) n (cid:1) ≤ δ (cid:48) ( n ) . By Lemma 4.5 and part (2)of Assumption 4.10, P n (cid:0) W Y (cid:48) n (cid:1) = P (cid:48) n (cid:0) Y (cid:48) n (cid:1) ≥ − δ (cid:48) ( n ) . It now follows from Lemma 4.16that P n (cid:0) X n (cid:1) diﬀers from P ( p (¯ x | p (cid:48) (¯ x )) · P (cid:48) ( p (cid:48) (¯ x )) by at most δ (cid:48) ( n ) (for suﬃciently large n ). (cid:3) Deﬁnition 4.18.

For every (possibly partial) σ -type p (¯ x ) such that p (cid:48) (¯ x ) = p (cid:22) σ (cid:48) is acomplete atomic σ (cid:48) -type, we deﬁne P ( p (¯ x )) = P (cid:48) ( p (cid:48) (¯ x )) · P ( p (¯ x ) | p (cid:48) (¯ x )) . With this deﬁnition we can reformulate Lemma 4.17 as follows:

Corollary 4.19. If p (¯ x ) is an (possibly partial) atomic σ -type such that p (cid:22) σ (cid:48) is a completeatomic σ (cid:48) -type, then, for all suﬃciently large n and all ¯ a ∈ [ n ] which realize the identityfragment of p (¯ x ) we have (cid:12)(cid:12) P n (cid:0) {A ∈ W n : A | = p (¯ a ) } (cid:1) − P ( p (¯ x )) (cid:12)(cid:12) < δ (cid:48) ( n ) . Lemma 4.20.

Suppose that p (¯ x, ¯ y ) is a complete atomic σ -type. Let p (cid:48) (¯ x, ¯ y ) = p (cid:22) σ (cid:48) , q (¯ x ) = p (cid:22) ¯ x and let p ¯ y (¯ x, ¯ y ) include p (cid:48) (¯ x, ¯ y ) and all formulas in p in which at least onevariable from ¯ y occurs. Then P ( p (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) = P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) · P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) . Proof.

By Lemma 4.12, for any suﬃcently large n , any ¯ a, ¯ b ∈ [ n ] and any A (cid:48) ∈ Y (cid:48) n suchthat A (cid:48) | = p (cid:48) (¯ a, ¯ b ) , we have P ( p (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p (¯ a, ¯ b ) } (cid:1) , P ( p y (¯ x, y ) | p (cid:48) (¯ x, ¯ y )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p y (¯ a, ¯ b ) } (cid:1) and P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = q (¯ a ) } (cid:1) . Note that p (¯ x, ¯ y ) = p (cid:48) (¯ x, ¯ y ) ∪ p ¯ y (¯ x, ¯ y ) ∪ q (¯ x ) . By Lemma 4.8, the event {A ∈ W A (cid:48) : A | = p ¯ y (¯ a, ¯ b ) } is independent, in ( W A (cid:48) , P A (cid:48) ) , from the event {A ∈ W A (cid:48) : A | = q (¯ a ) } .Therefore, P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p (¯ a, ¯ b ) } (cid:1) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p ¯ y (¯ a, ¯ b ) } (cid:1) · P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = q (¯ a ) } (cid:1) and from this the lemma follows. (cid:3) Lemma 4.21.

Let p (cid:48) (¯ x, ¯ y ) be a complete atomic σ (cid:48) -type, q (cid:48) (¯ x ) = p (cid:48) (cid:22) ¯ x and supposethat q (¯ x ) is a complete atomic σ -type such that q ⊇ q (cid:48) . Then P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) = P ( q (¯ x ) | q (cid:48) (¯ x )) . Proof.

Since q (cid:48) (¯ x ) ⊆ p (cid:48) (¯ x, ¯ y ) it follows from Lemma 4.12 that for any suﬃcently large n , any ¯ a, ¯ b ∈ [ n ] and any A (cid:48) ∈ Y (cid:48) n such that A (cid:48) | = p (cid:48) (¯ a, ¯ b ) , we have P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = q (¯ a ) } (cid:1) and P ( q (¯ x ) | q (cid:48) (¯ x )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = q (¯ a ) } (cid:1) . Hence P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) = P ( q (¯ x ) | q (cid:48) (¯ x )) . (cid:3) In Lemma 4.12 we deﬁned the notation P ( p (¯ x ) | p (cid:48) (¯ x )) when the atomic σ -type p has nomore variables than the complete atomic σ (cid:48) -type p (cid:48) . From Deﬁnition 4.18 of P ( p (¯ x )) itfollows that P ( p (¯ x ) | p (cid:48) (¯ x )) = P ( p (¯ x )) / P (cid:48) ( p (cid:48) (¯ x )) . Now we extend this notation to pairsof ( p (¯ x, ¯ y ) , q (¯ x )) where p (¯ x, ¯ y ) is a complete atomic σ -type and q (¯ x ) = p (cid:22) ¯ x . Deﬁnition 4.22.

Suppose that p (¯ x, y ) is a complete atomic σ -type and let q (¯ x ) = p (cid:22) ¯ x .We deﬁne P ( p (¯ x, ¯ y ) | q (¯ x )) = P ( p (¯ x, ¯ y )) P ( q (¯ x )) . In the same way, if p (cid:48) (¯ x, ¯ y ) is a complete atomic σ (cid:48) -type and q (cid:48) (¯ x ) = p (cid:48) (cid:22) ¯ x , then we deﬁne P (cid:48) ( p (cid:48) (¯ x, ¯ y ) | q (cid:48) (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) P (cid:48) ( q (cid:48) (¯ x )) . Lemma 4.23.

Suppose that p (¯ x, ¯ y ) is a complete atomic σ -type, let q (¯ x ) = p (cid:22) ¯ x andlet p ¯ y (¯ x, ¯ y ) be deﬁned as in Lemma 4.20. Then P ( p (¯ x, ¯ y ) | q (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y ) | q (cid:48) (¯ x )) · P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) . Proof.

Using Deﬁnition 4.18 and Lemmas 4.20 and 4.21 we get P ( p (¯ x, ¯ y ) | q (¯ x )) = P ( p (¯ x, ¯ y )) P ( q (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) · P ( p (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) P (cid:48) ( q (cid:48) (¯ x )) · P ( q (¯ x ) | q (cid:48) (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) · P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) · P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) P (cid:48) ( q (cid:48) (¯ x )) · P ( q (¯ x ) | q (cid:48) (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) · P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) · P ( q (¯ x ) | q (cid:48) (¯ x )) P (cid:48) ( q (cid:48) (¯ x )) · P ( q (¯ x ) | q (cid:48) (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y ) | q (cid:48) (¯ x )) · P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) . (cid:3) ONDITIONAL PROBABILITY LOGIC 19

Lemma 4.24.

Suppose that n is large enough that part (4) of Assumption 4.10 holds.Suppose that p (¯ x, y ) and q (¯ x ) are complete atomic σ -types such that | ¯ xy | ≤ k , dim y ( p ) = 1 and q ⊆ p . Let γ = P ( p (¯ x, y ) | q (¯ x )) and A (cid:48) ∈ Y (cid:48) n . Then P A (cid:48) n (cid:0) {A ∈ W A (cid:48) : A is ( p, q, γ/ (1 + ε (cid:48) ) ) -saturatedand ( p, q, γ (1 + ε (cid:48) ) ) -unsaturated } (cid:1) is at least − n | ¯ x | e − c ε (cid:48) γn where the constant c ε (cid:48) > depends only on ε (cid:48) . Proof.

Suppose that p (¯ x, y ) and q (¯ x ) are complete atomic σ -types such that | ¯ xy | ≤ k , dim y ( p ) = 1 and q ⊆ p . Let p (cid:48) = p (cid:22) σ and q (cid:48) = q (cid:22) σ (cid:48) . Moreover, let p y (¯ x, y ) include p (cid:48) (¯ x, y ) and all ( σ \ σ (cid:48) ) -formulas in p (¯ x, y ) which contain the variable y . Also, let α = P (cid:48) ( p (cid:48) (¯ x, y ) | q (cid:48) (¯ x )) ,β = P (cid:48) ( p y (¯ x, y ) | p (cid:48) (¯ x, y )) and γ = P (cid:48) ( p (¯ x, y ) | q (¯ x )) . By Lemma 4.23 we have γ = αβ .Let A (cid:48) ∈ Y (cid:48) n . By (4) of Assumption 4.10 A (cid:48) is ( p (cid:48) , q (cid:48) , α/ (1 + ε (cid:48) )) -saturated and ( p (cid:48) , q (cid:48) , α (1 + α )) -unsaturated if n is large enough. For every ¯ a ∈ [ n ] | ¯ x | let B (cid:48) ¯ a = (cid:8) b ∈ [ n ] : A (cid:48) | = p (cid:48) (¯ a, b ) (cid:9) . By the mentioned (un)saturation property, if A (cid:48) | = q (cid:48) (¯ a ) then αn/ (1 + ε (cid:48) ) ≤ | B (cid:48) ¯ a | ≤ αn (1 + ε (cid:48) ) . For every ¯ a ∈ [ n ] | ¯ x | and every A ∈ W A (cid:48) let B ¯ a, A = (cid:8) b ∈ [ n ] : A | = p y (¯ a, b ) (cid:9) and note that B ¯ a, A ⊆ B (cid:48) ¯ a for every ¯ a and every A ∈ W A (cid:48) . Let X ¯ a = (cid:8) A ∈ W A (cid:48) : either A (cid:54)| = q (¯ a ) or γ/ (1 + ε (cid:48) ) ≤ | B ¯ a, A | ≤ γ (1 + ε (cid:48) ) (cid:9) . Observe that if

A ∈ W A (cid:48) , A | = q (¯ a ) and A | = p y (¯ a, b ) , then A | = p (¯ a, b ) . Hence every A ∈ (cid:84) ¯ a ∈ [ n ] | ¯ x | X ¯ a is ( p, q, γ/ (1 + ε (cid:48) ) ) -saturated and ( p, q, γ (1 + ε (cid:48) ) ) -unsaturated.Fix any ¯ a such that A (cid:48) | = q (cid:48) (¯ a ) (and note that A | = q (¯ a ) implies A (cid:48) | = q (cid:48) (¯ a ) ). ByLemma 4.8, for all distinct b, c ∈ B (cid:48) ¯ a , the events E b = {A ∈ W A (cid:48) : A | = p y (¯ a, b ) } and E c = {A ∈ W A (cid:48) : A | = p y (¯ a, c ) } are independent. Moreover, by Lemma 4.12, for each b ∈ B (cid:48) ¯ a , P (cid:48) n ( E b ) = β . Let Z : W A (cid:48) → N be the random variable deﬁned by Z ( A ) = (cid:12)(cid:12) { b ∈ B (cid:48) ¯ a : A | = p y (¯ a, b ) } (cid:12)(cid:12) . Let ε = ε (cid:48) / (1 + ε (cid:48) ) and note that ε < ε (cid:48) and − ε = 1 / (1 + ε (cid:48) ) . By Lemma 2.5, P A (cid:48) (cid:0)(cid:12)(cid:12) Z − β | B (cid:48) ¯ a | (cid:12)(cid:12) > εβ | B (cid:48) ¯ a | (cid:1) < (cid:0) − c ε β | B (cid:48) ¯ a | (cid:1) where c ε depends only on ε and hence only on ε (cid:48) . Recall that αβ = γ and αn/ (1 + ε (cid:48) ) ≤ | B (cid:48) ¯ a | ≤ αn (1 + ε (cid:48) ) . From this it follows that (1 + ε (cid:48) ) γn ≥ (1 + ε (cid:48) ) β | B (cid:48) ¯ a | and γn/ (1 + ε (cid:48) ) ≤ β | B (cid:48) ¯ a | / (1 + ε (cid:48) ) . Therefore, if Z > (1 + ε (cid:48) ) γn or Z < γn/ (1 + ε (cid:48) ) , then (cid:12)(cid:12) Z − β | B (cid:48) ¯ a | (cid:12)(cid:12) > εβ | B (cid:48) ¯ a | . Hence, if c ε (cid:48) = c ε / (1 + ε (cid:48) ) , P A (cid:48) (cid:0) W A (cid:48) \ X ¯ a (cid:1) < (cid:0) − c ε β | B (cid:48) ¯ a | (cid:1) ≤ (cid:0) − c ε (cid:48) γn (cid:1) . Since the argument works for all ¯ a ∈ [ n ] | ¯ x | such that A (cid:48) | = q (cid:48) (¯ a ) it follows that P A (cid:48) (cid:18) (cid:92) ¯ a ∈ [ n ] | ¯ x | X ¯ a (cid:19) ≥ − n | ¯ x | e − c ε (cid:48) γn and this proves the lemma. (cid:3) The next lemma generalizes the previous one to types p (¯ x, ¯ y ) where the length of ¯ y isgreater than one. Lemma 4.25.

Suppose that n is large enough that part (4) of Assumption 4.10 holds.Suppose that p (¯ x, ¯ y ) and q (¯ x ) are complete atomic σ -types such that | ¯ x ¯ y | ≤ k , dim ¯ y ( p ) = | ¯ y | and q ⊆ p . Let γ = P ( p (¯ x, ¯ y ) | q (¯ x )) and A (cid:48) ∈ Y (cid:48) n . Then P A (cid:48) n (cid:0) {A ∈ W A (cid:48) : A is ( p, q, γ/ (1 + ε (cid:48) ) | ¯ y | ) -saturatedand ( p, q, γ (1 + ε (cid:48) ) | ¯ y | ) -unsaturated } (cid:1) is at least − | ¯ y | n | ¯ x | + | ¯ y |− e − c ε (cid:48) γn where the constant c ε (cid:48) > depends only on ε (cid:48) . Proof.

We prove the lemma by induction on m = | ¯ y | . The base case m = 1 isgiven by Lemma 4.24. Let p (¯ x, ¯ y ) and q (¯ x ) be as assumed in the lemma where ¯ y =( y , . . . , y m +1 ) . Let p m (¯ x, y , . . . , y m ) be the restriction of p to formulas with variablesamong ¯ x, y , . . . , y m . Furthermore, let α = P ( p m | q ) , β = P ( p | p m ) and γ = P ( p | q ) .Observe that by Deﬁnition 4.22 we have γ = P ( p ) P ( q ) = P ( p ) P ( p m ) · P ( p m ) P ( q ) = βα. Let A (cid:48) ∈ Y (cid:48) n . By the induction hypothesis, the probability (with the distribution P A (cid:48) )that(a) A ∈ W A (cid:48) is ( p m , q, α/ (1 + ε (cid:48) ) m ) -saturated and ( p m , q, α (1 + ε (cid:48) ) m ) -unsaturatedis at least − m n | ¯ x | + m − e − c ε (cid:48) αn where the constant c ε (cid:48) depends only on ε (cid:48) . By theinduction hypothesis again, the probability that(b) A ∈ W A (cid:48) is ( p, p m , β/ (1 + ε (cid:48) ) ) -saturated and ( p, p m , β (1 + ε (cid:48) ) ) -unsaturatedis at least − n | ¯ x | + m e − c ε (cid:48) βn where c ε (cid:48) is the same constant as above (since it dependsonly on ε (cid:48) ). It is straightforward to check that if A ∈ W A (cid:48) satisﬁes both (a) and (b) then A is ( p, q, γ/ (1 + ε (cid:48) ) m +1) ) -saturated and ( p, q, γ (1 + ε (cid:48) ) m +1) ) -unsaturated. Since γ = αβ ≤ min { α, β } it follows that the probability that A ∈ W A (cid:48) is ( p, q, γ/ (1 + ε (cid:48) ) m +1) ) -saturated and ( p, q, γ (1 + ε (cid:48) ) m +1) ) -unsaturated is at least − m +1 n | ¯ x | + m e − c ε (cid:48) γn . (cid:3) Deﬁnition 4.26.

Let p (¯ x, ¯ y ) and q (¯ x ) are complete atomic σ -types such that | ¯ x ¯ y | ≤ k , d = dim ¯ y ( p ) > , q ⊆ p and γ = P ( p | q ) . For every n , every A ∈ Y n is ( p, q, γ/ (1 + ε (cid:48) ) d ) -saturated and ( p, q, γ (1 + ε (cid:48) ) d ) -unsaturated. Lemma 4.28.

There is a constant c > such that for all suﬃciently large n , P n (cid:0) Y n (cid:1) ≥ (cid:0) − e − cn (cid:1)(cid:0) − δ (cid:48) ( n ) (cid:1) . Proof.

There are, up to changing variables, only ﬁnitely many atomic σ -types p (¯ x ) suchthat | ¯ x | ≤ k . It follows from Lemma 4.25 that there is a constant c > such that for alllarge enough n and all A (cid:48) ∈ Y (cid:48) n , P A (cid:48) n (cid:0) Y n ∩ W A (cid:48) (cid:1) ≥ − e − cn . Note that P n ( Y n ) = P n (cid:0) Y n | W Y (cid:48) n (cid:1) P n (cid:0) W Y (cid:48) n (cid:1) . By Lemma 4.5, P n ( W Y (cid:48) n ) = P (cid:48) n ( Y (cid:48) n ) and by Lemma 4.6 we have P n ( Y (cid:48) n | W Y (cid:48) n ) = P Y (cid:48) n ( Y n ∩ W Y (cid:48) n ) . Hence P n ( Y n ) = ONDITIONAL PROBABILITY LOGIC 21 P Y (cid:48) n ( Y n ∩ W Y (cid:48) n ) P (cid:48) n (cid:0) Y (cid:48) n (cid:1) . Then, reasoning similarly as in the proof of Lemma 4.16(using (4.1)), we get P Y (cid:48) n (cid:0) Y n ∩ W Y (cid:48) n (cid:1) = (cid:88) A (cid:48) ∈ Y (cid:48) n P Y (cid:48) n (cid:0) Y n ∩ W A (cid:48) (cid:1) = (cid:88) A (cid:48) ∈ Y (cid:48) n (cid:88) A∈ Y n ∩ W A(cid:48) P Y (cid:48) n ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) n (cid:88) A∈ Y n ∩ W A(cid:48) P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) (cid:88) A∈ Y n ∩ W A(cid:48) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P A (cid:48) (cid:0) Y n ∩ W A (cid:48) (cid:1) ≥ (cid:88) A (cid:48) ∈ Y (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) (cid:0) − e − cn (cid:1) = (cid:0) − e − cn (cid:1) . Using part (2) of Assumption 4.10 we know get P n ( Y n ) = P Y (cid:48) n (cid:0) Y n ∩ W Y (cid:48) n (cid:1) P (cid:48) n ( Y (cid:48) n ) ≥ (cid:0) − e − cn (cid:1)(cid:0) − δ (cid:48) ( n ) (cid:1) . (cid:3) Deﬁnition 4.29.

Let m be a positive integer. A real number α is called m -critical if atleast one of the following holds:(a) There are a complete atomic σ -type q (¯ x ) , distinct complete atomic σ -types p (¯ x, ¯ y ) , . . . , p l (¯ x, ¯ y ) and a number ≤ l (cid:48) ≤ l such that | ¯ x ¯ y | ≤ m , q ⊆ p i for all ≤ i ≤ l and α = (cid:80) l (cid:48) i =1 P ( p i | q ) (cid:80) li =1 P ( p i | q ) . (b) α = l (cid:48) /l where ≤ l (cid:48) ≤ l are integers and l is, for any choice of distinct variables x , . . . , x m , less or equal to the number of pairs ( p ( x , . . . , x m (cid:48) ) , q ( x , . . . , x d )) where d < m (cid:48) ≤ m , p and q are complete atomic σ -types such that q ⊆ p and dim ( x d ,...,x m (cid:48) ) ( p ) = 0 .From the deﬁnition it follows that (for every m ∈ N ) there are only ﬁnitely many m -critical numbers. It also follows (from part (b)) that, for every m , and are m -critical. Deﬁnition 4.30.

Let ϕ (¯ x ) ∈ CP L ( σ ) and let l = | ¯ x | + qr ( ϕ ) .(i) We call ϕ (¯ x ) noncritical if the following holds:If (cid:16) r + (cid:107) ψ (¯ z, ¯ y ) | θ (¯ z, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ z, ¯ y ) | θ ∗ (¯ z, ¯ y ) (cid:107) ¯ y (cid:17) or (cid:16) (cid:107) ψ (¯ z, ¯ y ) | θ (¯ z, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ z, ¯ y ) | θ ∗ (¯ z, ¯ y ) (cid:107) ¯ y + r (cid:17) is a subformula of ϕ (¯ x ) (where ψ, θ, ψ ∗ and θ ∗ denote formulas in CP L ( σ ) and ¯ z and ¯ y may have variables in common with ¯ x ) then, for all l -critical numbers α and β , r (cid:54) = α − β .(ii) Let ε > . We say that ϕ (¯ x ) is ε -noncritical if • ϕ (¯ x ) is noncritical and • whenever r appears in a subformula as in part (i) and α and β are l -criticalnumbers, then the following implications hold:If r + α > β then r + α/ (1 + 2 ε ) > β (1 + 2 ε ) , andif α > β + r then α/ (1 + 2 ε ) > β (1 + 2 ε ) + r .Since, for every l ∈ N , there are only ﬁnitely many l -critical numbers it follows that forevery noncritical ϕ (¯ x ) ∈ CP L ( σ ) , if one just chooses ε > suﬃciently small, then ϕ (¯ x ) is ε -noncritical. Deﬁnition 4.26 and Lemma 4.28 motivate the next deﬁnition. Deﬁnition 4.31.

Let ε > be such that ε = (1 + ε (cid:48) ) k .It follows from Deﬁnition 4.31 and Lemma 4.27 that if p (¯ x, ¯ y ) and q (¯ x ) are completeatomic σ -types such that | ¯ x ¯ y | ≤ k , d = dim ¯ y ( p ) > , q ⊆ p , P ( q ) > , and γ = P ( p | q ) , then for every n , every A ∈ Y n is ( p, q, γ/ (1 + ε )) -saturated and ( p, q, γ (1 + ε )) -unsaturated. By an analogous argument as in Remark 4.11 (iii), it now follows that if p (¯ x ) is a complete atomic σ -type such that | ¯ x | ≤ k and P ( p ) = 0 , then for all suﬃcientlylarge n , p is not realized in any member of Y n .In the proof of the proposition below we will sometimes abuse notation by treatingan atomic type p (¯ x ) as the formula obtained by taking the conjunction of all formulasin p (¯ x ) . So when writing, for example, ‘ (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, y ) ’ in the proof below we view p i,j (¯ x, y ) in this expression as the conjunction of all formulas in the complete atomic type p i,j (¯ x, y ) . Proposition 4.32. (Elimination of quantiﬁers)

Suppose that ϕ (¯ x ) ∈ CP L ( σ ) is ε -noncritical and | ¯ x | + qr ( ϕ ) ≤ k . Then there is a quantiﬁer-free formula ϕ ∗ (¯ x ) such thatfor all suﬃciently large n and every A ∈ Y n , A | = ∀ ¯ x ( ϕ (¯ x ) ↔ ϕ ∗ (¯ x )) . Proof.

Let an ε -noncritical ϕ (¯ x ) ∈ CP L ( σ ) be given with | ¯ x | + qr ( ϕ ) ≤ k . We willassume that ¯ x is nonempty (i.e. that ϕ has free variables). In Remark 4.39 it is indicatedwhich changes we need to make in the simpler case when ϕ has no free variable. Theproof proceeds by induction on quantiﬁer-rank. Suppose that qr ( ϕ ) > since otherwisewe can just let ϕ ∗ be ϕ and then we are done. If for all suﬃciently large n , for all A ∈ Y n and for all ¯ a ∈ [ n ] | ¯ x | we have A (cid:54)| = ϕ (¯ a ) then we can let ϕ ∗ (¯ x ) be the formula x (cid:54) = x andthen A | = ∀ ¯ x ( ϕ (¯ x ) ↔ ϕ ∗ (¯ x )) for all suﬃciently large n and all A ∈ Y n . So from now onwe assume that, for arbitrarily large n , there are A ∈ Y n and ¯ a such that A | = ϕ (¯ a ) .Suppose that ϕ (¯ x ) is ∃ yψ (¯ x, y ) for some ψ (¯ x, y ) . Then we have | ¯ xy | + qr ( ψ ) ≤ k andqr ( ψ ) < qr ( ϕ ) so, by the induction hypothesis, we may assume that ψ (¯ x, y ) is quantiﬁer-free. By assumption there are n , A ∈ Y n , ¯ a and b such that A | = ψ (¯ a, b ) . Then thereare m ≥ , diﬀerent complete atomic σ -types q i (¯ x ) , i = 1 , . . . , m , and, for each i , m i ≥ and diﬀerent complete atomic σ -types p i,j (¯ x, y ) , j = 1 , . . . , m i , such that q i ⊆ p i,j for all j and ψ (¯ x, y ) is equivalent to (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, y ) . If, for some i , P ( q i (¯ x )) = 0 , then q i isnot realized in any A ∈ Y n (for large enough n ) and can be removed. So we may assumethat all P ( q i ) > for all i . If, for some i and j , P ( p i,j | q ) = 0 then P ( p i,j ) = 0 so p i,j is notrealized in any A ∈ Y n for large enough n . So we may also assume that P ( p i,j | q i ) > for all i and j . If dim y ( p i,j ) = 1 then, by the deﬁnitions of Y n and ε , it follows that for allsuﬃciently large n and all A ∈ Y n , if A | = q i (¯ a ) then A | = ∃ yp i,j (¯ a, y ) . If dim y ( p i,j ) = 0 then, for all n and all A ∈ W n , if A | = q i (¯ a ) then A | = p i,j (¯ a, b ) for some b ∈ rng(¯ a ) . Itfollows that for all suﬃciently large n and all A ∈ Y n , A | = ∀ ¯ x (cid:0) ∃ yψ (¯ x, y ) ↔ (cid:87) mi =1 q i (¯ x ) (cid:1) .Now we consider the case when ϕ (¯ x ) has the form (cid:16) r + (cid:107) ψ (¯ x, ¯ y ) | θ (¯ x, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ x, ¯ y ) | θ ∗ (¯ x, ¯ y ) (cid:107) ¯ y (cid:17) or(4.3) (cid:16) (cid:107) ψ (¯ x, ¯ y ) | θ (¯ x, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ x, ¯ y ) | θ ∗ (¯ x, ¯ y ) (cid:107) ¯ y + r (cid:17) . (4.4)Since the second case (4.4) is treated by straightforward variations of the arguments fortaking care of the ﬁrst case (4.3) we only consider the ﬁrst case (4.3). Observe that | ¯ x ¯ y | + qr ( ψ ) ≤ k (because qr ( ϕ ) = | ¯ y | + max { qr ( ψ ) , qr ( θ ) , qr ( ψ ∗ ) , qr ( θ ∗ ) } ) and similarlyfor θ , ψ ∗ and θ ∗ . Since all the formulas ψ , θ , ψ ∗ and θ ∗ have smaller quantiﬁer-rankthan ϕ we may, by the induction hypothesis, assume that ψ (¯ x, ¯ y ) , θ (¯ x, ¯ y ) , ψ ∗ (¯ x, ¯ y ) and θ ∗ (¯ x, ¯ y ) are quantiﬁer-free formulas.If θ (¯ x, ¯ y ) or θ ∗ (¯ x, ¯ y ) is unsatisﬁable, then, by the provided semantics, we have A (cid:54)| = ϕ (¯ a ) for every σ -structure A and every sequence of elements ¯ a from the domain of A .In this case ϕ (¯ x ) is equivalent to any contradictory quantiﬁer-free formula with free ONDITIONAL PROBABILITY LOGIC 23 variables among ¯ x , for example the formula x (cid:54) = x . So from now on we assume that θ (¯ x, ¯ y ) and θ ∗ (¯ x, ¯ y ) are satisﬁable.Until further notice, assume also that ψ (¯ x, ¯ y ) ∧ θ (¯ x, ¯ y ) and ψ ∗ (¯ x, ¯ y ) ∧ θ ∗ (¯ x, ¯ y ) aresatisﬁable. Then there are distinct complete atomic σ -types q i (¯ x ) , p i,j (¯ x, ¯ y ) , for i =1 , . . . , m and j = 1 , . . . , m i , and distinct complete atomic σ -types t i (¯ x ) , s i,j (¯ x, ¯ y ) , for i = 1 , . . . , l and j = 1 , . . . , l i , such that the following conditions hold: • q i (¯ x ) ⊆ p i,j (¯ x, ¯ y ) for all i = 1 , . . . , m and all j = 1 , . . . , m i . • ψ (¯ x, ¯ y ) ∧ θ (¯ x, ¯ y ) is equivalent to (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, ¯ y ) . • t i (¯ x ) ⊆ s i,j (¯ x, ¯ y ) for all i = 1 , . . . , l and all j = 1 , . . . , l i . • θ (¯ x, ¯ y ) is equivalent to (cid:87) li =1 (cid:87) l i j =1 s i,j (¯ x, ¯ y ) .Since (cid:87) li =1 (cid:87) l i j =1 s i,j (¯ x, ¯ y ) is a consequence of (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, ¯ y ) it follows that m ≤ l and m i ≤ l i for all i ≤ m . Moreover, for every i ≤ m there is i (cid:48) such that q i = t i (cid:48) , and forall i ≤ m and all j ≤ m i there are i (cid:48) , j (cid:48) such that p i,j = s i (cid:48) ,j (cid:48) . Therefore we may assumein addition (by reordering if necessary) that(4.5) q i = t i for all i ≤ m and p i,j = s i,j for all i ≤ m and all j ≤ m i .For the same reasons as in the previous case we may assume that all of P ( q i ) , P ( p i,j ) , P ( t i ) and P ( s i,j ) are positive for all i and j . Next we deﬁne d i,j = dim ¯ y ( p i,j ) for all i = 1 , . . . , m and j = 1 , . . . , m i ,e i,j = dim ¯ y ( s i,j ) for all i = 1 , . . . , l and j = 1 , . . . , l i ,d i = max { d i, , . . . , d i,m i } for all i = 1 , . . . , m,e i = max { e i, , . . . , e i,l i } for all i = 1 , . . . , l,α i,j = P ( p i,j (¯ x, ¯ y ) | q i (¯ x )) for all i = 1 , . . . , m and j = 1 , . . . , m i ,α i = the sum of all α i,j such that d i,j = d i ,β i,j = P ( s i,j (¯ x, ¯ y ) | t i (¯ x )) for all i = 1 , . . . , l and j = 1 , . . . , l i ,β i = the sum of all β i,j such that e i,j = e i .It follows that for all i = 1 , . . . , m we have d i ≤ e i and α i ≤ β i . Deﬁnition 4.33.

For all i = 1 , . . . , l we deﬁne a number γ i as follows:(1) If i ≤ m and d i = e i > then we deﬁne γ i = α i /β i .(2) If i ≤ m and d i = e i = 0 then we deﬁne γ i = m i /l i .(3) If i ≤ m and d i < e i then we deﬁne γ i = 0 .(4) If m < i ≤ l then we deﬁne γ i = 0 .Now we can reason in exactly the same way with regard to the formulas ψ ∗ (¯ x, ¯ y ) and θ ∗ (¯ x, ¯ y ) . So there are numbers m ∗ , l ∗ , m ∗ i and l ∗ i and complete atomic σ -types q ∗ i (¯ x ) for i = 1 , . . . , m ∗ , p ∗ i,j (¯ x, ¯ y ) for i ≤ m ∗ and j = 1 , . . . , m ∗ i , t ∗ i (¯ x ) for i = 1 , . . . , l ∗ and s ∗ i,j (¯ x, ¯ y ) for i ≤ l ∗ and j = 1 , . . . , l ∗ i such that all which has been said about ψ , θ , q i , p i,j , t i and s i,j holds if these formulas and types are replaced by ψ ∗ , θ ∗ , q ∗ i , p ∗ i,j etcetera, and thenumbers m , l , m i , l i are replaced by m ∗ , l ∗ , m ∗ i and l ∗ i . Moreover, we deﬁne numbers d ∗ i,j , e ∗ i,j , d ∗ i , e ∗ i , α ∗ i,j , α ∗ i , β ∗ i,j , β ∗ i and γ ∗ i in the same way as above, using the types q ∗ i , p ∗ i,j , t ∗ i and s ∗ i,j instead of q i , p i,j , t i and s i,j .So far we have assumed that ψ (¯ x, ¯ y ) ∧ θ (¯ x, ¯ y ) and ψ ∗ (¯ x, ¯ y ) ∧ θ ∗ (¯ x, ¯ y ) are satisﬁable.If ψ (¯ x, ¯ y ) ∧ θ (¯ x, ¯ y ) is not satisﬁable, then we let m = 0 and we view the disjunction (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, ¯ y ) as “empty” and hence always false. In this case we always have i > m so it follows that γ i = 0 for all i = 1 , . . . , l . Similar conventions apply if ψ ∗ (¯ x, ¯ y ) ∧ θ ∗ (¯ x, ¯ y ) is not satisﬁable. With these conventions the case when any one of the mentionedformulas is unsatisﬁable is taken care of by the rest of the proof. Lemma 4.34.

Let i ∈ { , . . . , m } .(a) For all suﬃently large n and all A ∈ Y n , γ i (1 + 2 ε ) ≤ (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . (b) If d i = e i then, for all suﬃciently large n and all A ∈ Y n , (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) γ i . (c) If d i < e i then, for all suﬃciently large n and all A ∈ Y n , (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn where the constant C > depends only on the types p i,j and s i,j .(d) Parts (a), (b) and (c) hold if m, m i , l i , γ i , d i , e i , p i,j and s i,j are replaced by m ∗ , m ∗ i , l ∗ i , γ ∗ i , d ∗ i , e ∗ i , p ∗ i,j and s ∗ i,j , respectively. Proof.

We split the argument into cases corresponding to the three ﬁrst cases of Deﬁ-nition 4.33. Let

A ∈ Y n . First suppose that d i = e i > and hence γ i = α i /β i . Since A is assumed to be ( p i,j , q i , (1 + ε ) α i,j ) -unsaturated if d i,j > it follows that (cid:12)(cid:12) p i,j (¯ a, A ) (cid:12)(cid:12) ≤ (1 + ε ) α i,j n d i,j if d i,j > .If d i,j = 0 then (cid:12)(cid:12) p i,j (¯ a, A ) (cid:12)(cid:12) = 1 and each member of the unique tuple realizing p i,j (¯ a, ¯ y ) belongs to ¯ a . It follows that(4.6) (cid:12)(cid:12)(cid:12)(cid:12) m i (cid:91) j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) α i n d i for all suﬃciently large n .By similiar reasoning (and since we assume d i = e i ) we get(4.7) (cid:12)(cid:12)(cid:12)(cid:12) l i (cid:91) j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) β i n d i . Since A is assumed to be ( p i,j , q i , α i,j / (1+ ε )) -saturated if d i,j > and ( s i,j , t i , β i,j / (1+ ε )) -saturated if e i,j > it follows that (cid:12)(cid:12) p i,j (¯ a, A ) (cid:12)(cid:12) ≥ α i,j n d i,j / (1 + ε ) if d i,j > and (cid:12)(cid:12) s i,j (¯ a, A ) (cid:12)(cid:12) ≥ β i,j n e i,j / (1 + ε ) if e i,j > . This (and d i = e i ) implies that(4.8) (cid:12)(cid:12)(cid:12)(cid:12) m i (cid:91) j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ α i n d i ε and (cid:12)(cid:12)(cid:12)(cid:12) l i (cid:91) j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ β i n d i ε . From (4.6), (4.7) and (4.8) we get(4.9) γ i (1 + 2 ε ) = α i (1 + 2 ε ) β i ≤ (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) α i β i = (1 + 2 ε ) γ i . Now suppose that d i = e i = 0 . Then γ i = m i /l i . Also, each p i,j (¯ a, ¯ y ) and each s i,j (¯ a, ¯ y ) has a unique realization in A . Since we assume that p i,j (cid:54) = p i,j (cid:48) if j (cid:54) = j (cid:48) and s i,j (cid:54) = s i,j (cid:48) ONDITIONAL PROBABILITY LOGIC 25 if j (cid:54) = j (cid:48) we get γ i = m i l i = (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) , and now the inequalities of (a) and (b) follow trivially. Next, suppose that d i < e i . Then γ i = 0 . By similar reasoning as before, < β i n e i ε ≤ (cid:12)(cid:12)(cid:12)(cid:12) l i (cid:91) j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) for suﬃciently large n. It follows that γ i (1 + 2 ε ) = 0 ≤ (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Since e i > we can argue as we did to get (4.8), so we have (cid:12)(cid:12)(cid:12)(cid:12) l i (cid:91) j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ β i n e i ε . Depending on whether d i > or d i = 0 we get, by arguing as in previous cases, (cid:12)(cid:12)(cid:12)(cid:12) m i (cid:91) j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) α i n d i or (cid:12)(cid:12)(cid:12)(cid:12) m i (cid:91) j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) = m i . Since d i < e i we get, in either case, (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn for suﬃciently large n where C > is a constant that depends only on the types p i,j and s i,j .The proof of part (d) is, of course, exactly the same (besides the relevant replacementsof symbols). (cid:3) Deﬁnition 4.35.

Let I be the set of all i ∈ { , . . . , l } such that there exists some i (cid:48) ∈ { , . . . , l ∗ } such that t i = t ∗ i (cid:48) and r + γ i ≥ γ ∗ i (cid:48) . Remark 4.36. (The computational problem of ﬁnding I ) The number α i,j isobtained from numbers given by assumptions 4.1 and 4.10 and applying a number ofarithmetic operations which is linear in | p i,j | . It follows that the number of arithmeticoperations needed to compute α i is linear in (cid:80) m i j =1 | p i,j | , where by an arithmetic operationI mean addition, multiplication or division. The case is similar for β i , α ∗ i and β ∗ i . Thenumber of comparisons of literals needed to check if t i = t ∗ i (cid:48) is | t i | if we assume that weuse some uniform way of listing the literals in complete atomic σ -types. So to decide if i ∈ I we need to perform a number of arithmetic operations, comparisons of literals andcomparisons of numbers which is linear in m i (cid:88) j =1 | p i,j | + l i (cid:88) j =1 | s i,j | + m ∗ i (cid:88) j =1 | p ∗ i,j | + l ∗ i (cid:88) j =1 | s ∗ i,j | . Consequently the number of arithmetic operations, comparisons of literals and compar-isons of numbers that are needed to create I is linear in m (cid:88) i =1 m i (cid:88) j =1 | p i,j | + l (cid:88) i =1 l i (cid:88) j =1 | s i,j | + m ∗ (cid:88) i =1 m ∗ i (cid:88) j =1 | p ∗ i,j | + l ∗ (cid:88) i =1 l ∗ i (cid:88) j =1 | s ∗ i,j | . Lemmas 4.37 and 4.38 below show that ϕ (¯ x ) is equivalent, in every A ∈ W n for alllarge enough n , to a quantiﬁer-free formula which depends only on ϕ (¯ x ) and the liftedBayesian network G . As noted after Deﬁnition 4.29, 0 is a ζ -critical number for every ζ ,so r > (since ϕ is noncritical). Observe that it follows from Deﬁnitions 4.29 and 4.33that γ i and γ ∗ i are ( | ¯ x | + qr( ϕ )) -critical numbers for all i . Lemma 4.37.

Suppose that

A | = (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) . Then both (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y and (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y are deﬁned in A and r + (cid:12)(cid:12) ψ (¯ a, A ) ∩ θ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) θ (¯ a, A ) (cid:12)(cid:12) ≥ (cid:12)(cid:12) ψ ∗ (¯ a, A ) ∩ θ ∗ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) θ ∗ (¯ a, A ) (cid:12)(cid:12) , so(4.10) r + (cid:12)(cid:12)(cid:12) (cid:83) mι =1 (cid:83) m ι j =1 p ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) lι =1 (cid:83) l ι j =1 s ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Now ¯ a realizes exactly one of t (¯ x ) , . . . , t (¯ x ) and exactly one of t ∗ (¯ x ) , . . . , t ∗ (¯ x ) so thereare ≤ i ≤ l and ≤ i (cid:48) ≤ l ∗ such that A | = t i (¯ a ) ∧ t ∗ i (cid:48) (¯ a ) and hence t i = t ∗ i (cid:48) . If r + γ i ≥ γ ∗ i (cid:48) then i ∈ I and hence A | = (cid:87) i ∈ I t i (¯ a ) so we are done. Hence it remains to prove that r + γ i ≥ γ ∗ i (cid:48) . We divide the argument into cases. Case 1 : Suppose that i > m and i (cid:48) > m ∗ . Then, by the deﬁnition of γ i and γ ∗ i (Deﬁnition 4.33), we have γ i = γ ∗ i (cid:48) = 0 so we get r + γ i ≥ γ ∗ i (cid:48) . Case 2 : Suppose that i ≤ m and i (cid:48) > m ∗ . Then γ ∗ i (cid:48) = 0 and as γ i is always nonnegativewe get r + γ i ≥ γ ∗ i (cid:48) . Case 3 : Suppose that i > m and i (cid:48) ≤ m ∗ . By Lemma 4.34 (a), assuming that n issuﬃciently large, γ ∗ i (cid:48) (1 + 2 ε ) ≤ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Since i > m we have p ι,j (¯ a, A ) = ∅ for every ≤ ι ≤ m and every ≤ j ≤ m ι , so (cid:12)(cid:12)(cid:12) (cid:83) mι =1 (cid:83) m ι j =1 p ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) lι =1 (cid:83) l ι j =1 s ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) = 0 . This together with (4.10) implies that(4.11) r ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ γ ∗ i (cid:48) (1 + 2 ε ) . If r < γ ∗ i (cid:48) then, since ϕ (¯ x ) is ε -noncritical, we get r < γ ∗ i (cid:48) / (1 + 2 ε ) which contra-dicts (4.11). Hence r ≥ γ ∗ i (cid:48) and since γ i = 0 (because i > m ) we get r + γ i ≥ γ ∗ i (cid:48) . ONDITIONAL PROBABILITY LOGIC 27

Case 4 : Suppose that i ≤ m and i (cid:48) ≤ m ∗ . Then (4.10) reduces to(4.12) r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Towards a contradiction, suppose that r + γ i < γ ∗ i (cid:48) . Since ϕ (¯ x ) is assumed to be ε -noncritical we get(4.13) r + (1 + 2 ε ) γ i < γ ∗ i (cid:48) (1 + 2 ε ) . Recall, from the deﬁnition of d i and e i , that d i ≤ e i . We now consider two subcases andin each subcase we will derive a contradiction to (4.12). Subcase 4(a): Suppose that d i = e i . By parts (a) and (b) of Lemma 4.34, (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) γ i and (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ γ ∗ i (cid:48) (1 + 2 ε ) . (4.14)From (4.13) and (4.14) we get r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ r + (1 + 2 ε ) γ i < γ ∗ i (cid:48) (1 + 2 ε ) ≤ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) which contradicts (4.12). Subcase 4(b): Suppose that d i < e i . Then Lemma 4.34 (c) gives(4.15) (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn where the constant C > depends only on the involved types. Lemma 4.34 (a) gives (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ γ ∗ i (cid:48) (1 + 2 ε ) . Since d i < e i implies that γ i = 0 it follows from (4.13) that r < γ ∗ i (cid:48) / (1 + 2 ε ) . Note thatthe right hand term in (4.15) tends to 0 as n tends to inﬁnity. So for all suﬃciently large n we have r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ r + Cn < γ ∗ i (cid:48) (1 + 2 ε ) ≤ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) and this contradicts (4.12). Now suppose that

A | = (cid:87) i ∈ I t i (¯ a ) , so A | = t i (¯ a ) for some i ∈ I . By Deﬁnition 4.35of I there is i (cid:48) ∈ { , . . . , l ∗ } such that t i = t ∗ i (cid:48) and r + γ i ≥ γ ∗ i (cid:48) . Since ϕ (¯ x ) is an ε -noncritical formula it is, in particular, noncritical which implies that r + γ i (cid:54) = γ ∗ i (cid:48) andhence r + γ i > γ ∗ i (cid:48) . Since ϕ (¯ x ) is ε -noncritical it follows that(4.16) r + γ i / (1 + 2 ε ) > γ ∗ i (cid:48) (1 + 2 ε ) . It suﬃces to prove that(4.17) r + (cid:12)(cid:12)(cid:12) (cid:83) mι =1 (cid:83) m ι j =1 p ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) lι =1 (cid:83) l ι j =1 s ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Again we divide the proof into cases.

Case 1 : Suppose that i (cid:48) > m ∗ . Then the term to the right of ‘ ≥ ’ in (4.17) is zero,so (4.17) holds. Case 2 : Suppose that i > m and i (cid:48) ≤ m ∗ . Then the term immediately to the left of‘ ≥ ’ in (4.17) is zero, so we need to prove that(4.18) r ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . From i > m we get γ i = 0 so (4.16) reduces to(4.19) r > γ ∗ i (cid:48) (1 + 2 ε ) . Recall that from the deﬁnition it follows that d ∗ i (cid:48) ≤ e ∗ i (cid:48) . Subcase 2(a): Suppose that d ∗ i (cid:48) = e ∗ i (cid:48) . Then using parts (d) and (b) of Lemma 4.34 weget γ ∗ i (cid:48) (1 + 2 ε ) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . which together with (4.19) gives (4.18). Subcase 2(b): Suppose that d ∗ i (cid:48) < e ∗ i (cid:48) . Then parts (d) and (c) of Lemma 4.34 implythat (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn for some constant C > depending only on the involved types. Since r > it followsthat (4.18) holds for all suﬃciently large n . Case 3 : Suppose that i ≤ m and i (cid:48) ≤ m ∗ . Now (4.17) is equivalent to(4.20) r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . So it remains to prove (4.20). By Lemma 4.34 and (4.16) we have(4.21) r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ r + γ i (1 + 2 ε ) > (1 + 2 ε ) γ ∗ i (cid:48) . If d ∗ i (cid:48) = e ∗ i (cid:48) , then, by Lemma 4.34, (1 + 2 ε ) γ ∗ i (cid:48) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) which together with (4.21) gives (4.20).Now suppose that d ∗ i (cid:48) < e ∗ i (cid:48) . Then γ ∗ i (cid:48) = 0 and (4.21) reduces to(4.22) r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ r + γ i (1 + 2 ε ) > Lemma 4.34 gives (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn ONDITIONAL PROBABILITY LOGIC 29

This together with (4.22) gives (4.20) for all suﬃciently large n . This completes theproof of Lemma 4.37. (cid:3) Lemma 4.38.

Suppose that I = ∅ . Then for all suﬃciently large n , all A ∈ Y n and all ¯ a ∈ [ n ] | ¯ x | , A (cid:54)| = (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) . Hence the formula (4.3) is equivalent, in every such A , to any contradictory quantiﬁer-free formula. Proof.

Suppose that I = ∅ . Suppose towards a contradiction that there are arbitrarilylarge n , A ∈ Y n and ¯ a ∈ [ n ] | ¯ x | such that A | = (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) . Then we argue just as we did in the beginning of the proof of Lemma 4.37 to get (4.10)and ﬁnd ≤ i ≤ l and ≤ i (cid:48) ≤ l ∗ such that t i = t ∗ i (cid:48) . Since I = ∅ we must have i / ∈ I and therefore r + γ i < γ ∗ i (cid:48) . Now we can continue to argue exactly as in the proof ofLemma 4.37 to get a contradiction in each one of the cases 1–4 in that proof. (cid:3) Remark 4.39. (The case when ¯ x is empty) Suppose now that ¯ x is empty, so theformula (4.3) becomes(4.23) (cid:16) r + (cid:107) ψ (¯ y ) | θ (¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ y ) | θ ∗ (¯ y ) (cid:107) ¯ y (cid:17) , where we can assume that ψ , θ , ψ ∗ and θ ∗ are quantiﬁer-free. Then there are distincttypes p i (¯ y ) , i = 1 , . . . , m and distinct types s i (¯ y ) , i = 1 , . . . , l . We can now deﬁnenumbers γ and γ ∗ similarly as each γ i (and γ ∗ i ) was deﬁned above. We now get ananalogoue of Lemma 4.34 which gives the same kind of upper and lower bounds of (cid:12)(cid:12) (cid:83) mi =1 p i ( A ) (cid:12)(cid:12)(cid:46)(cid:12)(cid:12) (cid:83) li =1 s i ( A ) (cid:12)(cid:12) in terms of γ . If r + γ ≥ γ ∗ then, by the noncriticalityof (4.23), we get r + γ > γ ∗ and by the ε -noncriticality of the same formula we get r + γ/ (1 + 2 ε ) > γ ∗ (1 + 2 ε ) . Now we can argue similarly as in the “converse direction”in the proof of Lemma 4.37 and conclude that (4.23) is true in all A ∈ Y n for allsuﬃciently large n ; hence (4.23) is equivalent to (cid:62) in all such A . Now suppose that r + γ < γ ∗ and suppose, towards a contradiction, that there are arbitrarily large n and A ∈ Y n in which (4.23) holds. Then we can argue as in the ﬁrst part of the proof ofLemma 4.37 and get a contradiction. Hence, for all suﬃciently large n , (4.23) is falsein all A ∈ Y n ; consequently, (4.23) is equivalent to ¬(cid:62) in all such A . (The case when ϕ has the form ∃ yψ (¯ y ) is easier and analogous to the argument in the beginning of theproof of Proposition 4.32 so this part is left to the reader.)Now the proof of Proposition 4.32 is completed. (cid:3) Deﬁnition 4.40.

Deﬁne a function δ : N + → R ≥ by δ ( n ) = 5 · max { δ (cid:48) ( n ) , e − cn } where c > is like in Lemma 4.28. Proposition 4.41. (Completion of the induction step)

Let Y n ⊆ W n , ε > and δ ( n ) be as in deﬁnitions 4.26, 4.31 and 4.40, respectively. Then: (1) lim n →∞ δ ( n ) = 0 . (2) P n ( Y n ) ≥ − δ ( n ) for all suﬃciently large n . (3) For every complete atomic σ -type p (¯ x ) with | ¯ x | ≤ k there is a number which wedenote P ( p (¯ x )) such that for all suﬃciently large n and all ¯ a ∈ [ n ] which realizethe identity fragment of p , (cid:12)(cid:12) P n (cid:0) {A ∈ W n : A | = p (¯ a ) } (cid:1) − P ( p (¯ x )) (cid:12)(cid:12) ≤ δ ( n ) . (4) For every complete atomic σ -type p (¯ x, ¯ y ) with | ¯ x ¯ y | ≤ k and < dim ¯ y ( p (¯ x, y )) = | ¯ y | , if q (¯ x ) = p (cid:22) ¯ x and P ( q ) > , then for all suﬃciently large n , every A ∈ Y n is ( p, q, α/ (1+ ε )) -saturated and ( p, q, α (1+ ε )) -unsaturated if α = P ( p (¯ x, ¯ y )) | P ( q (¯ x )) . (5) For every ε -noncritical ϕ (¯ x ) ∈ CP L ( σ ) with | ¯ x | + qr ( ϕ ) ≤ k , there is a quantiﬁer-free σ -formula ϕ ∗ (¯ x ) such that for all suﬃciently large n and all A ∈ Y n , A | = ∀ ¯ x (cid:0) ϕ (¯ x ) ↔ ϕ ∗ (¯ x ) (cid:1) . Proof.

Parts (1) and (2) follows from the deﬁnition of δ ( n ) , Assumption 4.10 andLemma 4.28. Part (3) follows from Corollary 4.19. Part (4) follows from Corollary 4.27and the deﬁnition of ε . Part (5) follows from Proposition 4.32. (cid:3) Corollary 4.42.

Let ε > be as in Deﬁnition 4.31.(a) If ϕ (¯ x ) ∈ CP L ( σ ) is ε -noncritical and | ¯ x | + qr( ϕ ) ≤ k , then there are c > and ≤ d ≤ which is a sum of numbers of the form P ( p ) , where p is a complete atomic σ -type, such that for every m ∈ N + and every ¯ a ∈ [ m ] | ¯ x | such that A | = ϕ (¯ a ) for some A ∈ W m , (cid:12)(cid:12) P n ( ϕ (¯ a )) − d (cid:12)(cid:12) ≤ Cδ ( n ) for all suﬃciently large n where the constant C depends only on ϕ .(b) If ϕ ∈ CP L ( σ ) has no free variable, is ε -noncritical and qr( ϕ ) ≤ k , then either P n ( ϕ ) ≤ δ ( n ) for all suﬃciently large n , or P n ( ϕ ) ≥ − δ ( n ) for all suﬃciently large n .(c) Suppose that for every R ∈ σ \ σ (cid:48) , if ¯ x is the sequence of free variables of χ R,i then | ¯ x | + qr( χ R,i ) ≤ k . Let P ∗ n be deﬁned as P n except that we replace χ R,i by χ ∗ R,i inDeﬁnition 4.2 where χ ∗ R,i is a quantiﬁer-free formula which his equivalent to χ R,i in everystructure in Y n for all large enough n . If ϕ (¯ x ) ∈ CP L ( σ ) is ε -noncritical, | ¯ x | +qr( ϕ ) ≤ k and A | = ϕ (¯ a ) for some A ∈ W m and some m , then (cid:12)(cid:12) P ∗ n ( ϕ (¯ a )) − P n ( ϕ (¯ a )) (cid:12)(cid:12) ≤ δ ( n ) for all suﬃciently large n . Proof. (a) Suppose that ϕ (¯ x ) ∈ CP L ( σ ) is ε -noncritical and | ¯ x | + qr( ϕ ) ≤ k . Bypart (5) of Proposition 4.41 ϕ (¯ x ) is equivalent, in every A ∈ Y n (for large enough n ),to a quantiﬁer-free formula ϕ ∗ (¯ x ) . Then ϕ ∗ (¯ x ) is equivalent to a disjunction of completeatomic σ -types (cid:87) li =1 p i (¯ x ) . Suppose that A | = ϕ (¯ a ) for some A ∈ W m and some m .Let I be the set of indices i such that A | = p i (¯ a ) for some A ∈ W n and some n . Byassumption, I (cid:54) = ∅ . Let d = (cid:80) i ∈ I P ( p i ) . By part (3) of Proposition 4.41, we have (cid:12)(cid:12) P n ( ϕ ∗ (¯ a )) − d (cid:12)(cid:12) ≤ | I | δ ( n ) for all suﬃciently large n , and now (a) follows from part (2)of Proposition 4.41.(b) Suppose that ϕ ∈ CP L ( σ ) has no free variable, is ε -noncritical and qr( ϕ ) ≤ k . ByProposition 4.41 (5), there is a quantiﬁer-free sentence ϕ ∗ such that for all suﬃcientlylarge n and all A ∈ Y n , A | = ϕ ↔ ϕ ∗ . Then ϕ ∗ must be equivalent to ⊥ or (cid:62) . Theconclusion of part (b) now follows from parts (1) and (2) of Proposition 4.41.(c) Since χ R,i is equivalent to χ ∗ R,i in every

A ∈ Y n it follows from the deﬁnitionsof P n and P ∗ n that if A ∈ Y n then P ∗ n ( A ) = P n ( A ) . It follows that if X n ⊆ Y n then P ∗ n ( X n ) = P n ( X n ) and in particular P ∗ n ( Y n ) = P n ( Y n ) . Since P ∗ n ( W n \ Y n ) =1 − P ∗ n ( Y n ) , and similarly for P n , it follows that P ∗ n ( W n \ Y n ) = P n ( W n \ Y n ) . Frompart (2) of Proposition 4.41 we get P ∗ n ( W n \ Y n ) = P n ( W n \ Y n ) ≤ δ ( n ) . Let X n = {A ∈ W n : A | = ϕ (¯ a ) } . Then P ∗ n ( X n ) ≤ P ∗ n ( X n | Y n ) P ∗ n ( Y n ) + δ ( n ) = P ∗ n ( X n ∩ Y n ) + δ ( n ) = P n ( X n ∩ Y n ) + δ ( n ) ≤ P n ( X n ) + δ ( n ) , and by similar reasoning P n ( X n ) ≤ P ∗ n ( X n ) + δ ( n ) . (cid:3) ONDITIONAL PROBABILITY LOGIC 31 Concluding remarks

The results of this article considers one particular formal logic and one type of liftedgraphical model. Also, given these two things, choices have been made for example re-garding exactly how to deﬁne a probability distribution on the set of structures with acommon ﬁnite domain. From the point of view machine learning and artiﬁcial intelli-gence, as well as mathematical curiosity, one could ask a number of questions, of whichI suggest a few below.In ﬁnite model theory, theoretical computer science and linguistics a number of ex-tensions of ﬁrst-order logic have been considered [20]. For example, a generic way ofextending ﬁrst-order logic is by adding one or more so-called generalized quantiﬁers[15, 17]. In machine learning, data mining and artiﬁcial intelligence a number of diﬀer-ent (lifted) graphical models, including the popular

Markov networks [7, 18]. For whichcombinations of formal logical language and lifted graphical model do we get “almost sureelimination of quantiﬁers” and/or “logical limit laws”? Do we get more expressive for-malisms by using aggregation functions than if we use aggregation rules, or vice versa?How do diﬀerent combinations of formal language and graphical model relate to eachother? In what sense is a combination (formal language 1, graphical model 1) “better”than a combination (formal language 2, graphical model 2)? What are reasonable can-didates for the relation “A is better/stronger than B”? Some thoughts in this directionappear in the last part of [5].One can consider conditional probabilities which are not constant, but depend on thesize of the set of elements (or tuples) satisfying the condition in question. As a specialcase we have probabilities that depend on the size of the whole domain, as in previouswork on logical zero-one laws in random graphs [24, 25].)What if the probability of a tuple ¯ a satisfying a relation is dependent on whetheranother tuple ¯ b satisﬁes the same relation (as in [19, 21] for example)?A situation that seems natural in the context of artiﬁcial intelligence is to have anunderlying ﬁxed structure and on top of it relations that are “governed” by some prob-abilistic graphical model. The underlying ﬁxed structure could be represented by a τ -structure A for some signature τ . For another signature σ (disjoint from τ ) we couldconsider the set of expansions of A to ( τ ∪ σ ) -structures where the probabilities of theseextensions are governed by some probabilistic model and the underlying structure A . Toformalize this using the set up of this article, one can modify W ∅ n in Deﬁnition 3.10 tocontain exactly one τ -structure with domain [ n ] and W n will be the set of all ( τ ∪ σ ) -structures that expand the uniquen structure in W ∅ n . The deﬁnition of the probabilitydistribution P n on W n can now depend not only on the lifted Bayesian network G butalso on the unique structure in W ∅ n . It seems obvious that, in order to get similar re-sults as in this article, one needs to assume some sort of uniformity regarding the uniquestructure in W ∅ n for coﬁnitely many n . References [1] N. Alon, J. H. Spencer,

The Probabilistic Method , Second Edition, John Wiley & Sons (2000)[2] F. Bacchus, A. J. Grove, J. Y. Halpern, D. Koller, From statistical knowledge bases to degrees ofbelief,

Artiﬁcial Intelligence , Vol. 87 (1996) 75–143.[3] C. Borgelt, R. Kruse,

Graphical Models: Methods for Data Analysis and Mining , John Wiley &Sons (2002).[4] H. Chernoﬀ, A measure of the asymptotic eﬃciency for tests of a hypothesis based on the sum ofobservations,

Annals of Mathematical Statistics , Vol. 23 (1952) 493–509.[5] L. De Raedt, P. Frasconi, K. Kersting, S. Muggleton (editors),

Probabilistic Inductive Logic Pro-gramming: Theory and Applications , Lecture Notes in Artiﬁcial Intelligence 4911, Springer-Verlag,Berlin Heidelberg (2008). [6] L. De Raedt, K. Kersting, S. Natarajan, D. Poole,

Statistical Relational Artiﬁcial Intelligence: Logic,Probability, and Computation , Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning

Communications ofthe ACM , Vol. 62 (2019) 74–83.[8] R. Fagin, Probabilities on ﬁnite models,

The Journal of Symbolic Logic , Vol. 41 (1976) 50-58.[9] Lise Getoor, Ben Taskar (Editors),

Introduction to Statistical Relational Learning , The MIT Press(2007).[10] Y. V Glebskii, D. I. Kogan, M. I. Liogonkii, V. A. Talanov, Volume and fraction of satisﬁability offormulas of the lower predicate calculus,

Kibernetyka

Vol. 2 (1969) 17-ö27.[11] J. Y. Halpern, An analysis of ﬁrst-order logics of probability,

Artiﬁcial Intelligence , Vol. 46 (1990)311–350.[12] C. D. Hill, On 0,1-laws and asymptotics of deﬁnable sets in geometric Fraïssé classes,

FundamentaMathematicae , Vol. 239 (2017) 201–219.[13] M. Jaeger, Convergence results for relational Bayesian networks,

Proceedings of the 13th AnnualIEEE Symposium on Logic in Computer Science (LICS 98) (1998).[14] M. Jaeger, Reasoning about inﬁnite random structures with relational Bayesian networks,

Proceed-ings of the Sixth International Conference on Principles of Knowledge Representation and Reasoning(KR 98) (1998).[15] R. Kaila, On probabilistic elimination of generalized quantiﬁers,

Random Structures and Algorithms ,Vol. 19 (2001) 1–36.[16] H. J. Keisler, W. B. Lotfallah, Almost everywhere elimination of probability quantiﬁers,

The Journalof Symbolic Logic , Vol. 74 (2009) 1121–1142.[17] E. Keenan, D. Westerstahl, Generalized quantiﬁers in linguistics and logic, in J. van Benthem, A.ter Meulen (editors),

Handbook of Logic and Language, Second Edition , 859–910, Elsevier (2011).[18] A. Kimmig, L. Mihalkova, L. Getoor, Lifted graphical models: a survey,

Machine Learning , Vol. 99(2015) 1–45.[19] Ph. G. Kolaitis, H. J. Prömel, B. L. Rothschild, K l +1 -free graphs: asymptotic structure and a 0-1law, Transactions of The American Mathematical Society , Vol. 303 (1987) 637–671.[20] L. Libkin,

Elements of Finite Model Theory , Springer-Verlag, Berlin Heidelberg New York (2004).[21] J. F. Lynch, Convergence law for random graphs with speciﬁed degree sequence,

ACM Transactionson Computational Logic , Vol. 6 (2005) 727–748.[22] D. Mubayi, C. Terry, Discrete metric spaces: structure, enumeration, and 0-1 laws,

The Journal ofSymbolic Logic , Vol. 84 (2019) 1293–1324.[23] J. Pearl,

Causality: Models, Reasoning, and Inference , Second Edition, Cambridge University Press(2009).[24] S. Shelah, J. Spencer, Zero-one laws for sparse random graphs,

Journal of the American Mathemat-ical Society , Vol. 1 (1988) 97–115.[25] J. Spencer,

The Strange Logic of Random Graphs , Springer-Verlag, Berlin Heidelberg New York(2001).

Vera Koponen, Department of Mathematics, Uppsala University, Box 480, 75106 Upp-sala, Sweden.

E-mail address ::