Conditional probability logic, lifted bayesian networks and almost sure quantifier elimination
CCONDITIONAL PROBABILITY LOGIC, LIFTED BAYESIANNETWORKS AND ALMOST SURE QUANTIFIER ELIMINATION
VERA KOPONEN
Abstract.
We introduce a formal logical language, called conditional probability logic(CPL) , which extends first-order logic and which can express probabilities, conditionalprobabilities and which can compare conditional probabilities. Intuitively speaking,although formal details are different, CPL can express the same kind of statements assome languages which have been considered in the artificial intelligence community.We also consider a way of making precise the notion of lifted Bayesian network , wherethis notion is a type of (lifted) probabilistic graphical model used in machine learn-ing, data mining and artificial intelligence. A lifted Bayesian network (in the sensedefined here) determines, in a natural way, a probability distribution on the set ofall structures (in the sense of first-order logic) with a common finite domain D . Ourmain result (Theorem 3.14) is that for every “noncritical” CPL-formula ϕ (¯ x ) thereis a quantifier-free formula ϕ ∗ (¯ x ) which is “almost surely” equivalent to ϕ (¯ x ) as thecardinality of D tends towards infinity. This is relevant for the problem of makingprobabilistic inferences on large domains D , because (a) the problem of evaluating, by“brute force”, the probability of ϕ (¯ x ) being true for some sequence ¯ d of elements from D has, in general, (highly) exponential time complexity in the cardinality of D , and(b) the corresponding probability for the quantifier-free ϕ ∗ (¯ x ) depends only on thelifted Bayesian network and not on D . Some conclusions regarding the computationalcomplexity of finding ϕ ∗ are given in Remark 3.17. The main result has two corollaries,one of which is a convergence law (and zero-one law) for noncritial CPL-formulas. Introduction
We consider an extension of first-order logic which we call conditional probability logic (Definition 3.1), abbreviated CPL, with which it is possible to express statements aboutprobabilities, conditional probabilities, and to compare conditional probabilities whichmakes it possible to express statements about the (conditional) independence (or de-pendence) of events or random variables. Remarks 3.4, 3.6 and Example 3.5 belowillustrate this. The semantics of CPL deals only with finite structures and assumes thatall elements in a structure are equally likely, so (conditional) probabilities correspond toproportions. Quite similar formal languages, which aim at expressing the same sort ofstatements, have been studied within the field of artificial intelligence by Halpern [11,Section 2] and Bacchus et. al. [2, Definition 4.1]. CPL is more expressive than theprobability logic L ωP considered by Keisler and Lotfallah in [16] (which cannot express conditional probabilities) and our first theorem (Theorem 3.14) is a generalization oftheir main result [16, Theorem 4.9], both in the sense that the language considered hereis more expressive and that we consider a wider range of probability distributions.A graphical model for a probability distribution and a set of random variables is a“graphical” way of describing the conditional dependencies and independencies betweenthe random variables. In such a probabilistic model the random variables are also viewedas the vertices of a directed or undirected graph where edges indicate conditional depen-dencies and independencies [3, 23]. The notion of a Bayesian network is one of the most Date : 3 April 2020. a r X i v : . [ m a t h . L O ] A p r VERA KOPONEN well-known graphical models. A
Bayesian network G for a probability space ( S, µ ) andrandom binary variables X , . . . , X n is determined by the following data:(1) A (not necessarilly connected) directed acyclic graph (DAG), also denoted G ,with vertex set V = { X , . . . , X n } such that if there is an arrow (directed arc)from X i to X j then i < j .(2) To each vertex X i ∈ V conditional probabilities are associated in such a way thatthe following holds:(a) For each j the set of parents of X j , denoted par ( X j ) , is minimal (withrespect to set inclusion) with the property that for every i < j , X i and X j are independent over par ( X j ) .(b) The joint probability distribution on X , . . . , X n is determined by the con-ditional probabilities associated with the vertices of G .If G is a Bayesian network as defined above, then it follows (from e.g. [23, Definition 1.2.1and Theorems 1.2.6, 1.2.7]) that(i) For every X j ∈ V , X j and the set of all predecessors of X j are independent over par ( X j ) .(ii) For every X j ∈ V , X j and the set of all nondescendants of X j (except X j itself)are independent over par ( X j ) .Moreover: if condition (i) or condition (ii) holds, then { X , . . . , X n } can be ordered sothat conditions (a) and (b) above hold without changing the arrows of the DAG.Graphical models are used in machine learning, data mining and artificial intelligencein (probability based) learning and inference making. To illustrate this by a very simpleexample, suppose that we have a finite set A of some kind of objects and properties P, Q and R which objects in A may, or may not, have. We can view A as a “training set”. Thetraining set can be formalized as a σ -structure with domain A where σ = { P, Q, R } and P, Q and R are also viewed as unary relation symbols. Let µ be a probability distributionon A and let binary random variables X, Y, Z : A → { , } be defined by X ( a ) = 1 if a has the property P and X ( a ) = 0 otherwise (for every a ∈ A ); Y ( a ) = 1 if a has theproperty Q and Y ( a ) = 0 otherwise; and analogously for Z and R . Suppose that, aftersome “learning”, we have found a Bayesian network G for ( A, µ ) and X, Y, Z such that itsDAG is as illustrated and the (conditional) probabilities µ ( X = 1) , µ ( Y = 1 | X = 1) , XY Z µ ( Y = 1 | X = 0) , µ ( Z = 1 | X = 1) and µ ( Z = 1 | X = 0) are specified. (In realapplications, it is unlikely that a relatively simple probabilistic model,which is desirablefor computational efficiency, fits the training data completely and usually this is noteven the goal because one wants to avoid so-called “overfitting”; so one can view theBayesian network as a reasonable approximation of the training data.) An application ofthe Bayesian network G is to make predictions about probabilities on some other finitedomain B . Let us now make the following assumptions, partly based on G but where theindependency assumptions between different objects are imposed. Every b ∈ B has theprobability µ ( X = 1) of having property P , independently of what the case is for other b (cid:48) ∈ B . For every b ∈ B , if b has the property P then the probability that b also has theproperty Q ( R ) is µ ( Y = 1 | X = 1) ( µ ( Z = 1 | X = 1) ), independently of what the caseis for other elements in B , and if b does not have the property P then the probabilitythat b has Q ( R ) is µ ( Y = 1 | X = 0) ( µ ( Z = 1 | X = 0) ), independently of whatthe case is for other elements. Based on this we can define a probability distribution onthe set W B of all σ -structures with domain B , where each member of W B representsa “possible scenario” or “possible world”. For every formula ϕ ( x , . . . , x k ) of conditional ONDITIONAL PROBABILITY LOGIC 3 probability logic and any choice of b , . . . , b k ∈ B we can now ask what the probabilityis that ϕ ( x , . . . , x k ) is satisfied by b , . . . , b k .When using a Bayesian network G for prediction as in the example we have “lifted”it from its original context (the set A ) and used it on a new domain of objects. Alsowhen moving from the fixed domain A to an arbitrary domain B we have, in a sense,“lifted” our reasoning from propositional logic to first-order logic, or some extension ofit. Perhaps this is the reason why the term “lifted graphical model” is used by someauthors when a graphical model is used to describe or predict (conditional) probabilitiesof events on an arbitrary or unknown domain; see [18] for a survey of lifted graphicalmodels. In the subfield of machine learning, data mining and artificial intelligence called statistical relational learning (or sometimes probabilistic logic learning ) the “lifted” per-spective is central as one here considers general domains of objects and properties andrelations that may, or may not, hold for, or between, the objects. (See for example[6, 9].) There is no consensus regarding what, exactly, a lifted Bayesian network (letalone lifted graphical model) is or how it determines a probability distribution on a setof “possible worlds”. Different approaches have been considered. A key question is howthe probability that a random variable takes a particular value is influenced by its par-ents in the DAG of the Bayesian network. The above example uses the most simple formof aggregation/combination rules . Another approach is to use aggregation/combinationfunctions . (Some explanation of these notions are found in e.g. [6, p. 31, 54], [18, p.18], [13].) From a practical point of view it probably makes sense to have the freedomto adapt one’s lifted graphical model to the application at hand, so uniformity may notbe a primary concern for practicians. But to prove mathematical theorems about liftedgraphical models, and the probability distributions that they induce, we need (of course)to make precise what we mean, which is done in Section 3.In this article we use aggregation rules expressed by formulas of conditional probabilitylogic (CPL). The idea is that for any relation symbol R , of arity k say, there are a aninteger ν R , numbers α R,i ∈ [0 , , and CPL-formulas χ R,i ( x , . . . , x k ) for i = 1 , . . . , ν R such that if χ R,i ( x , . . . , x k ) holds, then the probability that R ( x , . . . , x k ) holds is α R,i .This formalism is strong enough to express, for example, aggregation rules of the followingkind for arbitrary m , any CPL-formula ψ ( x , . . . , x k ) and any α i ∈ [0 , , i = 0 , . . . , m :For all i = 0 , . . . , m , if the proportion of k -tuples that satisfy ψ ( x , . . . , x k ) is in theinterval [ i/m, ( i + 1) /m ] , then the probability that R ( x , . . . , x k ) holds is α i .Once we have made precise (as in Definition 3.8) what we mean by a lifted Bayesiannetwork G for a finite relational signature σ (i.e. a finite set of relation symbols, possiblyof different arities) and also made precise (as in Definition 3.11) how G determines aprobability distribution P D on the set of all σ -structures with domain D (for some finiteset D ), then we can ask questions like this: Given a CPL-formula, ϕ ( x , . . . , x k ) and d , . . . , d k ∈ D what is the probability that ϕ ( x , . . . , x k ) is satisfied by the sequence d , . . . , d k ? Or more formally, what is P D (cid:0) {D ∈ W D : D | = ϕ ( d , . . . , d k ) } (cid:1) ? It iscomputationally very expensive to answer the question by analyzing all members of W D ,since, in general, the cardinality of W D is in the order of | D | r where r is the maximalarity of relation symbols in σ and | D | is the cardinality of D . However, our first theorem(Theorem 3.14) says that if ϕ is “noncritical” in the sense that its conditional probabilityquantifiers (if any) avoids “talking about” certain finitely many critical numbers, thenthere is a quantifier-free formula ϕ ∗ ( x , . . . , x k ) such that, with probability approaching1 as | D | → ∞ , ϕ and ϕ ∗ are equivalent. If we are given such ϕ ∗ then we can easilycompute the probability α ∗ = P D (cid:0) {D ∈ W D : D | = ϕ ∗ ( d , . . . , d k ) } (cid:1) by using onlythe lifted Bayesian network G , so in particular this computation is independent of thecardinality of D . Moreover, α ∗ only depends on the quantifier-free formula ϕ ∗ and not VERA KOPONEN on the choice of elements d , . . . , d k . We also get that, as | D | → ∞ , P D (cid:0) {D ∈ W D : D | = ϕ ( d , . . . , d k ) } (cid:1) → α ∗ .But of course, given a noncritical ϕ , we have to first find a quantifier-free ϕ ∗ whichis “almost surely” equivalent to ϕ . The proof of Theorem 3.14 produces an algorithmfor doing this. At one step in the algoritm one may need to transform a quantifier-free formula into an equivalent disjunctive normal form and this computational taskis, in general, NP-hard. But if one assumes that all quantifier-free subformulas of ϕ are disjunctive normal forms, then the algorithm that produces ϕ ∗ works in quadratictime in the length of ϕ if we assume that an arithmetic operation, a comparison of twonumbers and a comparison of two literals is completed in one time step (more details inRemark 3.17).The proof of Theorem 3.14 gives some by-products such as a “logical limit/convergencelaw” (Theorem 3.15) and a result (Theorem 3.16) saying that for every lifted Bayesiannetwork as in Definition 3.8 there is an “almost surely equivalent” lifted Bayesian networkin which all aggregation formulas (as in Definition 3.8) are quantifier-free. The originalzero-one law for first-order logic, independently of Glebskii et. al. [10] and Fagin [8],becomes a special case of Theorem 3.15 when we restrict attention to first-order sen-tences and the DAG of the lifted Bayesian network has no edges and all the probabilitiesassociated to the vertices are / .A couple of earlier results exist which have similarity to the results of this article.Jaeger [13] has considered another sort of lifted Bayesian network which he calls rela-tional Bayesian network . Instead of using using aggregation/combination rules (as wedo in this article) relational Bayesian networks use aggregation/combination functions.Theorem 3.9 in [13] is as analogoue of Theorem 3.15 below for first-order formulas inthe setting of relational Bayesian networks which use only “exponentially convergent”combination functions. Theorem 4.7 in [14] has a similar flavour as Theorem 3.16 below,but [14] considers “admissible” relational Bayesian networks and a probability measuredefined by such on the set of structures with a common infinite countable domain.The results of this article are mainly motivated by concepts and methods in machinelearning, data mining and artificial intelligence, but if the results are seen from the per-spective of finite model theory and random discrete structures, then they join a longtradition of results concerning logical limit laws and almost sure elimination of quanti-fiers. For a very small and eclectic selection of work in this field, ranging from the firstto some of the last, see for example [8, 10, 12, 15, 19, 21, 22, 24, 25].The organization of this article is as follows. Section 2 introduces the basic conventionsused in this article as well as some basic definitions. Section 3 defines the main notionsof the article and states the main results. Section 4 gives the proofs of these results.The last section is a brief discussion about further research in the topics of formal logic,probabilistic graphical models, almost sure elimination of quantifiers and convergencelaws. 2. Preliminaries
Basic knowledge of first-order logic and first-order structures is expected and there aremany sources in which the reader can find this background, for example [20]. In thissection we clarify and define some basic notation and terminology concerning logic andgraph theory. Formulas of a formal logic will usually be denoted by ϕ , ψ , θ or χ , possiblywith sub- or superscripts. Logical variables will be denoted x, y, z, u, v, w possibly withsub- or superscripts. Finite sequences/tuples of variables are similarly denoted ¯ x, ¯ y, ¯ z ,etc. If a formula is denoted by ϕ (¯ x ) then it is, as usual, assumed that all free variablesof ϕ occur in the sequence ¯ x (but we do not insist that every variable in ¯ x occurs in theformula denoted by ϕ (¯ x ) ); moreover in this context we will assume that all variables in ¯ x ONDITIONAL PROBABILITY LOGIC 5 are different although this is occasionally restated. In general, finite sequences/tuples ofelements are denoted by ¯ a, ¯ b, ¯ c , etc. For a sequence ¯ a , rng(¯ a ) denotes the set of elementsoccuring in ¯ a . For a sequence ¯ a , | ¯ a | denotes its length. For a set A , | A | denotes itscardinality. In particular, if ϕ is a formula of some formal logic (so ϕ is a sequence ofsymbols), then | ϕ | denotes its length. Sometimes we abuse notation by writing ‘ ¯ a ∈ A ’when we actually mean that rng(¯ a ) ⊆ A .By a signature (or vocabulary ) we mean a set of relation symbols, function symbolsand constant symbols. A signature σ is called finite a relational if it is finite as a setand all symbols in it are relation symbols. We use the terminology ‘ σ -structure ’, or just structure if we omit mentioning the signature, in the sense of first-order logic. Structuresin this sense will be denoted by calligraphic letters A , B , C , etc. The domain (or universe)of a structure A will often be denoted by the corresponding non-calligraphic letter A .A structure is called finite if its domain is finite. If σ (cid:48) ⊂ σ are signatures and A is σ -structure, then A (cid:22) σ (cid:48) denotes the reduct of A to the signature σ (cid:48) . We let [ n ] denotethe set { , . . . , n } . We use the terminology atomic ( σ -)formula in the sense of first-orderlogic with equality, so in particular, the expression ‘ x = y ’ is an atomic σ -formula forevery signature σ , including the empty signature σ = ∅ . It will also be convenient tohave a special symbol (cid:62) which is viewed as an atomic σ -formula for every signature σ ;the formula (cid:62) is interpreted as being true in every structure. Definition 2.1.
Let σ be a finite relational signature and ¯ x a sequence of differentvariables.(i) If ϕ (¯ x ) is an atomic σ -formula, then ϕ (¯ x ) and ¬ ϕ (¯ x ) are called σ -literals .(ii) A consistent set of σ -literals is called an atomic σ -type . When denoting an atomic σ -type by p (¯ x ) it is assumed (as for formulas) that if a variable occurs in a formulain p (¯ x ) , then it belongs to the sequence ¯ x .(iii) If p (¯ x ) is an atomic σ -type, then the identity fragment of p (¯ x ) is the set offormulas of the form x i = x j or x i (cid:54) = x j that belong to p (¯ x ) .(iv) If p (¯ x ) denotes an atomic σ -type and for every σ -literal ϕ (¯ x ) , either ϕ (¯ x ) ∈ p (¯ x ) or ¬ ϕ (¯ x ) ∈ p (¯ x ) , then p (¯ x ) is called a complete atomic σ -type (with respect to σ ) .An atomic σ -type which is not complete is sometimes called partial .(v) Let p (¯ x, ¯ y ) be an atomic σ -type. The ¯ y -dimension of p (¯ x, ¯ y ) , denoted dim ¯ y ( p (¯ x, ¯ y )) ,is the maximal d ∈ N such that there are a σ -structure A and ¯ a, ¯ b ∈ A such that A | = p (¯ a, ¯ b ) and (cid:12)(cid:12) rng(¯ b ) \ rng(¯ a ) (cid:12)(cid:12) ≥ d .(vi) Let σ (cid:48) ⊆ σ and let p be an atomic σ -type. Then p (cid:22) σ (cid:48) = { ϕ ∈ p : ϕ is a σ (cid:48) -formula } and p (cid:22) ¯ x = { ϕ ∈ p : all free variables of ϕ occur in ¯ x } . Remark 2.2.
Note that if p (¯ x ) is complete atomic σ -type where ¯ x = ( x , . . . , x m ) , thenthis implies that for all ≤ i, j ≤ m , either x i = x j or x i (cid:54) = x j belongs to p (¯ x ) . (Alsoobserve that if p (¯ x, ¯ y ) is a complete atomic σ -type and dim ¯ y ( p (¯ x, ¯ y )) = d , then for every σ -structure A and for all ¯ a, ¯ b such that A | = p (¯ a, ¯ b ) , we have (cid:12)(cid:12) rng(¯ b \ rng(¯ a ) (cid:12)(cid:12) = d . Notation 2.3.
Let σ be a signature, ¯ x a sequence of different variables, A a σ -structurewith domain A and ¯ a ∈ A | ¯ x | .(i) If p (¯ x ) is an atomic σ -type, then the notation ‘ A | = p (¯ a ) ’ means that A | = ϕ (¯ a ) for every formula ϕ (¯ x ) ∈ p (¯ x ) , or in other words that ¯ a satisfies every formula in p (¯ x ) with respect to the structure A , or (to use model theoretic language) that ¯ a realizes p (¯ x ) with respect to the structure A .(ii) If ¯ y is a sequence of different variables (such that no variable occurs in both ¯ x and ¯ y ) and q (¯ x, ¯ y ) is an atomic σ -type, then q (¯ a, A ) = { ¯ b ∈ A | ¯ y | : A | = q (¯ a, ¯ b ) } .By a directed graph we mean a pair ( V, E ) where V is a (vertex) set and E ⊆ V × V . A directed acyclic graph , abbreviated DAG , is a directed graph ( V, E ) such that ( v, v ) / ∈ E VERA KOPONEN for all v ∈ V and such that there do not exist distinct v , . . . , v k ∈ V for any k ≥ suchthat ( v i , v i +1 ) ∈ E for all i = 0 , . . . , k − and ( a k , a ) ∈ E . A directed path in a directedgraph ( V, E ) is a sequence of distinct vertices v , . . . , v k ∈ V such that ( v i , v i +1 ) for all i = 0 , . . . , k − ; the length of this path is the number of edges in it, in other words, thelength is k . Definition 2.4. (About directed acyclic graphs)
Suppose that G is a DAG withnonempty and finite vertex set V . Let a ∈ V .(i) A vertex b ∈ V is a parent of a if ( b, a ) is a directed edge of G . We let par ( a ) denote the set of parents of a .(ii) We define the maximal path rank of a , or just mp-rank of a , denoted mpr( a ) , tobe the length of the longest directed path having a as its first vertex (i.e. thelength of the longest path a , a , . . . , a k where a = a and ( a i , a i +1 ) is a directededge for each i = 0 , . . . , k − ).(iii) The maximal path rank of G , or just mp-rank of G , denoted mpr( G ) is definedas mpr( G ) = max { mpr( a ) : a ∈ V } .Observe that if G is a DAG with vertex set V and mpr( G ) = r and G (cid:48) is the inducedsubgraph of G with vertex set V (cid:48) = { a ∈ V : mpr( a ) < r } , then, for every a ∈ V (cid:48) , themp-rank of a is the same no matter if we compute it with respect to G (cid:48) or with respectto G ; it follows that mpr( G (cid:48) ) = r − .We call a random variable binary if it can only take the value or . The followingis a direct consequence of [1, Corollary A.1.14] which in turn follows from the Chernoffbound [4]: Lemma 2.5.
Let Z be the sum om n independent binary random variables, each onewith probability p of having the value 1. For every ε > there is c ε > , depending onlyon ε , such that the probability that | Z − pn | > εpn is less than e − c ε pn . Conditional probability logic and lifted Bayesian networks
In this section we define the main concepts of this article and state the main results.
Definition 3.1. (Conditional probability logic)
Suppose that σ is a signature. Thenthe set of conditional probability formulas over σ , denoted CP L ( σ ) , is defined inductivelyas follows:(1) Every atomic σ -formula belongs to CP L ( σ ) (where ‘atomic’ has the same mean-ing as in first-order logic with equality).(2) If ϕ, ψ ∈ CP L ( σ ) then ( ¬ ϕ ) , ( ϕ ∧ ψ ) , ( ϕ ∨ ψ ) , ( ϕ → ψ ) , ( ϕ ↔ ψ ) , ( ∃ xϕ ) ∈ CP L ( σ ) where x is a variable. (As usual, in practice we do not necessarily write out allparanteses.) We consider ∀ xϕ to be an abbreviation of ¬∃ x ¬ ϕ .(3) If r ≥ is a real number, ϕ, ψ, θ, τ ∈ CP L ( σ ) and ¯ y is a sequence of distinctvariables, then (cid:16) r + (cid:107) ϕ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y (cid:17) ∈ CP L ( σ ) and (cid:16) (cid:107) ϕ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y + r (cid:17) ∈ CP L ( σ ) . In both these new formulas all variables of ϕ, ψ, θ and τ that appear in thesequence ¯ y become bound . So this construction can be seen as a sort of quantifi-cation, which may become more clear by the provided semantics below.A formula ϕ ∈ CP L ( σ ) is called quantifier-free if contains no quantifier, that is, if it isconstructed from atomic formulas by using only connectives ¬ , ∧ , ∨ , → , ↔ . Definition 3.2. (Semantics)
ONDITIONAL PROBABILITY LOGIC 7 (1) The interpretations of ¬ , ∧ , ∨ , → , ↔ and ∃ are as in first-order logic.(2) Suppose that A is a finite σ -structure and let ϕ (¯ x, ¯ y ) , ψ (¯ x, ¯ y ) , θ (¯ x, ¯ y ) , τ (¯ x, ¯ y ) ∈ CP L ( σ ) . Let ¯ a ∈ A | ¯ x | .(a) We define ϕ (¯ a, A ) = (cid:8) ¯ b ∈ A | ¯ y | : A | = ϕ (¯ a, ¯ b ) (cid:9) .(b) The expression A | = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) means that ψ (¯ a, A ) (cid:54) = ∅ , τ (¯ a, A ) (cid:54) = ∅ and r + (cid:12)(cid:12) ϕ (¯ a, A ) ∩ ψ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ψ (¯ a, A ) (cid:12)(cid:12) ≥ (cid:12)(cid:12) θ (¯ a, A ) ∩ θ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) τ (¯ a, A ) (cid:12)(cid:12) and in this case we say that (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) istrue (or holds) in A . If ψ (¯ a, A ) = ∅ or τ (¯ a, A ) = ∅ or r + (cid:12)(cid:12) ϕ (¯ a, A ) ∩ ψ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ψ (¯ a, A ) (cid:12)(cid:12) < (cid:12)(cid:12) θ (¯ a, A ) ∩ τ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) τ (¯ a, A ) (cid:12)(cid:12) then we write A (cid:54)| = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) and say that (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) is false in A .(c) The meaning of A | = (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y + r (cid:17) is defined similarly. Remark 3.3. (A warning)
Observe that with the given semantics,
A (cid:54)| = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) does not necessarily imply A | = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≤ (cid:107) θ (¯ a, ¯ y ) | τ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) because the first formula may fail to be true for ¯ a because ψ (¯ a, A ) = ∅ or τ (¯ a, A ) = ∅ in which case the corresponding fraction is undefined and then also the other formula isfalse for ¯ a . Remark 3.4. (Expressing conditional probabilities, or just probabilities)
Let ¯ x = ( x , . . . , x k ) and ¯ y = ( y , . . . , y l ) . If τ (¯ x, ¯ y ) denotes the formula y = y and θ (¯ x, ¯ y ) denotes the formula y (cid:54) = y , then(3.1) (cid:16) (cid:107) ϕ (¯ x, ¯ y ) | ψ (¯ x, ¯ y ) (cid:107) ¯ y ≥ (cid:107) θ (¯ x, ¯ y ) | τ (¯ x, ¯ y ) (cid:107) ¯ y + r (cid:17) expresses that the proportion of tuples ¯ y that satisfy ϕ (¯ x, ¯ y ) among those ¯ y that sat-isfy ψ (¯ x, ¯ y ) is at least r . Thus the formula expresses a conditional probability if weassume that all l -tuples have the same probability. Under the stated assumptions, let usabbreviate (3.1) by(3.2) (cid:16) (cid:107) ϕ (¯ x, ¯ y ) | ψ (¯ x, ¯ y ) (cid:107) ¯ y ≥ r (cid:17) . If we assume, in addition, that ψ (¯ x, ¯ y ) is the formula y = y , then each of (3.1) and (3.2)expresses that the proportion of l -tuples ¯ y that satisfy ϕ (¯ x, ¯ y ) is at least r . VERA KOPONEN
Example 3.5.
Suppose that M is a unary relation symbol and F a binary relationsymbol. Consider the statement “For at least half of all persons x , if at least one thirdof the friends of x are mathematicians, then x is a mathematician”. If M ( x ) expressesthat “ x is a mathematician” and F ( x, y ) expresses that “ x and y are friends”, then thisstatement can be formulated in CPL, using the abbreviation (3.2), as (cid:16)(cid:13)(cid:13)(cid:0) (cid:107) M ( y ) | F ( x, y ) (cid:107) y ≥ / (cid:1) → M ( x ) (cid:12)(cid:12) x = x (cid:13)(cid:13) x ≥ / (cid:17) . Remark 3.6. (Expressing independence)
Suppose that A is a finite σ -structure, θ (¯ x, ¯ y ) is the formula y = y and ¯ a ∈ A | ¯ x | . If r = 0 and A | = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ϕ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) ∧ (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y + r (cid:17) , then the event X = { ¯ b ∈ A | ¯ y | : A | = ϕ (¯ a, ¯ b ) } is independent from the event Y = { ¯ b ∈ A | ¯ y | : A | = ψ (¯ a, ¯ b ) } if all | ¯ y | -tuples have the same probability.If A represents a database from the real world, then it is unlikely that events of interestare (conditionally) independent according the precise mathematical definition. Insteadone may look for “approximate (conditional) independencies”. If r is changed to be asmall positive number and if A | = (cid:16) r + (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ϕ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) ∧ (cid:16) (cid:107) ϕ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ϕ (¯ a, ¯ y ) | ψ (¯ a, ¯ y ) (cid:107) ¯ y + r (cid:17) ∧ (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | ϕ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) ∧ (cid:16) (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ (¯ a, ¯ y ) | ϕ (¯ a, ¯ y ) (cid:107) ¯ y + r (cid:17) then the dependency between X and Y is weak, or one could say that they are “approx-imately independent up to an error of r ”. The reason for the more complicated formulais to make “ r -approximate independence” symmetric. Definition 3.7.
The quantifier rank , qr( ϕ ) , of formulas ϕ ∈ CP L ( σ ) is defined induc-tively as follows:(1) For atomic ϕ , qr( ϕ ) = 0 .(2) qr( ¬ ϕ ) = qr( ϕ ) , qr( ϕ (cid:63) ψ ) = max { qr( ϕ ) , qr( ψ ) } if (cid:63) is one of ∧ , ∨ , → or ↔ .(3) qr( ∃ xϕ ) = qr( ϕ ) + 1 (4) qr (cid:16)(cid:0) r + (cid:107) ϕ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y (cid:1)(cid:17) = qr (cid:16)(cid:0) (cid:107) ϕ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y + r (cid:1)(cid:17) =max { qr( ϕ ) , qr( ψ ) , qr( θ ) , qr( τ ) } + | ¯ y | . Definition 3.8. (Lifted Bayesian network)
Let σ be a finite relational signature. Inthis article we define a lifted Bayesian network for σ to consist of the following compo-nents:(a) An acyclic directed graph (DAG) G with vertex set σ .(b) For each R ∈ σ , a number ν R ∈ N + , formulas χ R,i (¯ x ) ∈ CP L (par( R )) , for i = 1 , . . . , ν R , where | ¯ x | equals the arity of R , such that ∀ ¯ x (cid:0) (cid:87) ν R i =1 χ R,i (¯ x ) (cid:1) isvalid (i.e. true in all par( R ) -structures) and if i (cid:54) = j then ∃ ¯ x (cid:0) χ R,i (¯ x ) ∧ χ R,j (¯ x ) (cid:1) is unsatisfiable. Each χ R,i will be called an aggregation formula (of G ) .(c) For each R ∈ σ and each ≤ i ≤ ν R , a number denoted µ ( R | χ R,i ) (or µ ( R (¯ x ) | χ R,i (¯ x )) ) in the interval [0 , .We will use the same symbol (for example G ) to denote a lifted Bayesian network andits underlying DAG. The intuitive meaning of µ ( R | χ R,i ) in part (c) is that if ¯ a is a ONDITIONAL PROBABILITY LOGIC 9 sequence of elements from a structure and ¯ a satisfies χ R,i (¯ x ) , then the probability that ¯ a satisfies R (¯ x ) is µ ( R | χ R,i ) . Remark 3.9. (Subnetworks)
Let G denote a lifted Bayesian network for σ . Supposethat σ (cid:48) ⊂ σ is such that if R ∈ σ (cid:48) then par( R ) ⊆ σ (cid:48) . Then it is easy to see that σ (cid:48) determines a lifted Bayesian network G (cid:48) for σ (cid:48) such that • the vertex set of the underlying DAG of G (cid:48) is σ (cid:48) , • for every R ∈ σ (cid:48) , the number ν R and the formulas χ R,i , i = 1 , . . . , ν R , are thesame as those for G , • for every R ∈ σ (cid:48) and every ≤ i ≤ ν R , the numbers µ ( R | χ R,i ) are the same asthose for G .We call the so defined lifted Bayesian network G (cid:48) for σ (cid:48) the subnetwork (of G ) inducedby σ (cid:48) . Definition 3.10. (The case of an empty signature) (i) As a technical conveniencewe will also consider a lifted Bayesian network, denoted G ∅ , for the empty signature ∅ .According to Definition 3.8 the vertex set of the underlying DAG of G ∅ is ∅ , the emptyset. It follows that no formulas or numbers as in parts (b) and (c) of Definition 3.8 needto be specified for G ∅ .(ii) For every n ∈ N + , let W ∅ n denote the set of all ∅ -structures with domain [ n ] and notethat every W ∅ n has only one member which is just the set [ n ] .(iii) For every n ∈ N + , let P ∅ n be the unique probability distribution on W ∅ n . Definition 3.11. (The probability distribution in the general case)
Let σ be afinite nonempty relational signature and let G denote a lifted Bayesian network for σ .Suppose that the underlying DAG of G has mp-rank ρ . For each ≤ r ≤ ρ let G r bethe subnetwork (in the sense of Remark 3.9) induced by σ r = { R ∈ σ : mp( R ) ≤ r } andnote that G ρ = G . Also let G − = G ∅ and let P − n be the unique probability distributionon W − n = W ∅ n . By induction on r we define, for every r = 0 , , . . . , ρ , a probabilitydistribution P rn on the set W rn of all σ r -structures with domain [ n ] as follows: For every A ∈ W rn , P rn ( A ) = P r − n ( A (cid:22) σ r − ) (cid:89) R ∈ σ r \ σ r − ν R (cid:89) i =1 (cid:89) ¯ a ∈ χ R,i ( A (cid:22) σ r − ) λ ( A , R, i, ¯ a ) where λ ( A , R, i, ¯ a ) = µ ( R | χ R,i ) if A | = χ R,i (¯ a ) ∧ R (¯ a ) , − µ ( R | χ R,i ) if A | = χ R,i (¯ a ) ∧ ¬ R (¯ a ) , W n = W ρn and P n = P ρn , so P n is a probability distribution on the set ofall σ -structures with domain [ n ] . Remark 3.12. ((Ir)reflexive and/or symmetric relations)
Let A be a set and let R ⊆ A k be a k -ary relation on A . We call R reflexive if for all a ∈ A the k -tuple containing a in each coordinate belongs to R . We call R irreflexive if for every ( a , . . . , a k ) ∈ R we have a i (cid:54) = a j if i (cid:54) = j . We call R symmetric if for every ( a , . . . , a k ) ∈ R , everypermutation of ( a , . . . , a k ) also belongs to R . Consider Definition 3.11 and let R ∈ σ . Wecan make sure that P n ( A ) > only if the interpretation of R in A is reflexive (respectivelyirreflexive) by choosing the formulas χ R,i and associated (conditional) probabilities inan appropriate way. To achieve that P n ( A ) > only if the interpretation of R in A issymmetric we can do like this: In the definition of λ ( A , R, i, ¯ a ) (in Definition 3.11) weinterpret R (¯ a ) as meaning that R is satisfied by every permutation of ¯ a and we interpret ¬ R (¯ a ) as meaning that R is not satisfied by any permutation of ¯ a . We also need to assume that for every k -tuple ¯ a , either every permutation of ¯ a satisfies χ R,i (¯ x ) or nopermutation of ¯ a satisfies χ R,i (¯ x ) . Then the proof of Theorems 3.14 – 3.16 still worksout with very small modifications. Definition 3.13.
Let σ , W n and P n be as in Definition 3.11.(i) If ϕ (¯ x ) ∈ CP L ( σ ) and ¯ a ∈ [ n ] | ¯ x | , then we define P n ( ϕ (¯ a )) = P n (cid:0) {A ∈ W n : A | = ϕ (¯ a ) } (cid:1) .(ii) If ϕ ∈ CP L ( σ ) has no free variables (i.e. is a sentence), then we define P n ( ϕ ) = P n (cid:0) {A ∈ W n : A | = ϕ } (cid:1) .Now we can state the main results. They use the notion of noncritical formula whichdepends on the lifted Bayesian network under consideration. Since this notion is quitetechnical and relies on some technical results (concerning the convergence of the prob-ability that an atomic type is realized) which will be proved later, we give the precisedefinition later in Definition 4.30; in that context it will be more evident why the defini-tion of noncritical formula looks as it looks. For now I only say this: For every m ∈ N + there are finitely many numbers (depending only on G ) which are called m -critical (ac-cording to Definition 4.29). Roughly speaking, a formula ϕ (¯ x ) ∈ CP L ( σ ) is noncritical (details in Definition 4.30) if for every subformula (of ϕ (¯ x ) ) of the form (cid:16) r + (cid:107) χ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y (cid:17) or (cid:16) (cid:107) χ | ψ (cid:107) ¯ y ≥ (cid:107) θ | τ (cid:107) ¯ y + r (cid:17) the number r is not the difference of two m -critical numbers where m = | ¯ x | + qr( ϕ ) . Itfollows that every first-order formula is noncritical . Theorem 3.14. (Almost sure elimination of quantifiers for noncritical formu-las)
Let σ be a finite relational signature, let G be a lifted Bayesian network and, for each n ∈ N + , let P n be the probability distribution induced by G (according to Definition 3.11)on the set W n of all σ -structures with domain [ n ] . Suppose that every aggregate formula χ R,i of G is noncritical. If ϕ (¯ x ) ∈ CP L ( σ ) is noncritical, then there are a quantifier freeformula ϕ ∗ (¯ x ) ∈ CP L ( σ ) and c > , which depend only on ϕ (¯ x ) and G , such that for allsufficiently large n P n (cid:0) ∀ ¯ x ( ϕ (¯ x ) ↔ ϕ ∗ (¯ x )) (cid:1) ≥ − e − cn . Theorem 3.15. (Convergence for noncritical formulas)
Let σ , G , W n and P n beas in Theorem 3.14. For every noncritical ϕ (¯ x ) ∈ CP L ( σ ) there are c > and ≤ d ≤ ,depending only on ϕ (¯ x ) and G , such that for every m ∈ N + and every ¯ a ∈ [ m ] | ¯ x | , (cid:12)(cid:12) P n ( ϕ (¯ a )) − d (cid:12)(cid:12) ≤ − e − cn for all sufficiently large n ≥ m .Moreover, if ϕ has no free variable (i.e. is a sentence), then P n ( ϕ ) converges to either 0or 1. Theorem 3.16. (An asymptotically equivalent “quantifier-free” network)
Let σ , G , W n and P n be as in Theorem 3.14. Then for every aggregate formula χ R,i (¯ x ) of G there is a quantifier-free formula χ ∗ R,i (¯ x ) containing only relation symbols that occurin χ R,i such that if G ∗ is the lifted Bayesian network • with the same underlying DAG as G , • where, for every R ∈ σ and every ≤ i ≤ ν R , the aggregate formula χ R,i isreplaced by χ ∗ R,i , and • where µ ∗ ( R | χ ∗ R,i ) = µ ( R | χ R,i ) for every R ∈ σ and every ≤ i ≤ ν R ,then for every noncritical ϕ (¯ y ) ∈ CP L ( σ ) there is d > , depending only on ϕ (¯ y ) and G , such that for every m ∈ N + and every ¯ a ∈ [ m ] | ¯ y | , , (cid:12)(cid:12) P n ( ϕ (¯ a )) − P ∗ n ( ϕ (¯ a )) (cid:12)(cid:12) ≤ e − dn , for all sufficiently large n ≥ m, ONDITIONAL PROBABILITY LOGIC 11 where P ∗ n is the the probability distribution on W n according to Definition 3.11 if G isreplaced by G ∗ and P n is replaced by P ∗ n . Remark 3.17. (Computational complexity)
The proof of Theorem 3.14 indicatesan algorithm for finding the quantifier-free ϕ ∗ from ϕ . Suppose that we fix the liftedBayesian network (so σ is also fixed) and try to understand how efficient the algorithmis with respect to the length of ϕ . The crucial step is Definition 4.35 and Lemma 4.37which together show how to eliminate a quantifier of the form constructed in part (3)of Definition 3.1 in a satisfiable formula. However, at this step in the proof we assumethat the formulas inside the latest quantification are written as disjunctions of completeatomic types. The problem of transforming an arbitrary quantifier-free formula intoan equivalent disjunctive normal form is NP-hard so the algorithm is not efficient ingeneral (given the current state of affairs in computational complexity theory). But ifwe assume that every quantifier-free subformula of ϕ is a disjunctive normal form, thenthe number “steps” that the indicated algorithm needs to find ϕ ∗ is O ( | ϕ | ) if | ϕ | denotesthe length of ϕ and “step” means an arithmetic operation , a comparison of two numbersor a comparison of two literals. This essentially follows from Remark 4.36 because thenumber of times that a quantifier needs to be eliminated is bounded by | ϕ | . Remark 3.18. (Necessity of noncriticality)
It follows from Remark 3.4 that forevery sentence ψ of the language L ωP considered in [16] there is a sentence of CPL whichhas exactly the same finite models as ψ . Therefore it follows from [16, Proposition 3.1]that the assumption that ϕ is noncritical in Theorems 3.14 and 3.15 is necessary, even ifwe assume that σ contains one binary relation symbol and no other symbols. One mayask if it is also necessary in the above theorems that all aggregation formulas χ R,i arenoncritical. I do not currently know but I assume that the answer is yes.4.
Proof of Theorems 3.14, 3.15 and 3.16
Let σ be a finite relational signature and G a lifted Bayesian network for σ . The proofproceeds by induction on the mp-rank of the underlying DAG of G . The base case will not be when the mp-rank of G is 0. Instead the base case will be the “empty” liftedBayesian network for the empty signature ∅ , as described in Definition 3.10. In the caseof an empty signature (and consequently empty lifted Bayesian network) Theorems 3.14– 3.16 are a direct consequence of Lemma 4.13 below.The rest of the proof concerns the induction step. The induction step is provedby Proposition 4.41 and Corollary 4.42 which rely (only) on Assumption 4.1 belowwhich states the general assumptions related to the lifted Bayesian network and As-sumption 4.10 below which states the induction hypothesis. Theorems 3.14 – 3.16 followfrom the arguments in this section, in particular Proposition 4.41 and Corollary 4.42,because • k ∈ N + can be chosen arbitrarily large in Lemma 4.13 and in Assumption 4.10, • ε (cid:48) > can be chosen arbitrarily small in Lemma 4.13 and in Assumption 4.10,and • because we can choose δ (cid:48) ( n ) = e − dn for any d > in Lemma 4.13 and becauseof the lower bound in Lemma 4.28.For the rest of this section we assume the following: Assumption 4.1. (Relationship to a lifted Bayesian network) • σ is a finite relational signature and σ (cid:48) is a proper subset of σ . More precisely, adding, multiplying or dividing two numbers. • For each R ∈ σ \ σ (cid:48) , of arity m say, there are a number ν R ∈ N , a sequence ofvariables ¯ x = ( x , . . . , x m ) and formulas χ R,i (¯ x ) ∈ CP L ( σ (cid:48) ) , for i = 1 , . . . , ν R ,such that ∀ ¯ x (cid:0) (cid:87) ν R i =1 χ R,i (¯ x ) (cid:1) is valid (i.e. true in all σ (cid:48) -structures) and if i (cid:54) = j then ∃ ¯ x (cid:0) χ R,i (¯ x ) ∧ χ R,j (¯ x ) (cid:1) is unsatisfiable. • For every R ∈ σ \ σ (cid:48) and every ≤ i ≤ ν R , µ ( R | χ R,i ) denotes a real number inthe interval [0 , . (Sometimes we write µ ( R (¯ x | χ R,i (¯ x )) where ¯ x is a sequenceof variables the length of which equals the arity of R .) • For every σ -structure A , every R ∈ σ \ σ (cid:48) , every ≤ i ≤ ν R and every ¯ a ∈ A r where r is the arity of R , let λ ( A , R, i, ¯ a ) = µ ( R | χ R,i ) if A | = χ R,i (¯ a ) ∧ R (¯ a ) , − µ ( R | χ R,i ) if A | = χ R,i (¯ a ) ∧ ¬ R (¯ a ) , • For every n ∈ N + , W (cid:48) n is the set of all σ (cid:48) -structures with domain [ n ] = { , . . . , n } and P (cid:48) n is a probability distribution on W (cid:48) n . • For every n ∈ N + , W n is the set of all σ -structures with domain [ n ] .Recall that, according to Definition 3.2, if ψ (¯ x ) ∈ CP L ( σ (cid:48) ) and A ∈ W n then ψ ( A (cid:22) σ (cid:48) ) = { ¯ b : A (cid:22) σ (cid:48) | = ψ (¯ b ) } . Definition 4.2.
For every n ∈ N and every A ∈ W n we define P n ( A ) = P (cid:48) n ( A (cid:22) σ (cid:48) ) (cid:89) R ∈ σ \ σ (cid:48) ν R (cid:89) i =1 (cid:89) ¯ a ∈ χ R,i ( A (cid:22) σ (cid:48) ) λ ( A , R, i, ¯ a ) . Then P n is a probability distribution on W n which we may call the P (cid:48) n -conditionalprobability distribution on W n . Notation 4.3.
The notation in this section will follow the following pattern: σ (cid:48) -structures,in particular members of W (cid:48) n , will be denoted A (cid:48) , B (cid:48) , etcetera; subsets of W (cid:48) n will be de-noted X (cid:48) (or X (cid:48) n ), Y (cid:48) (or Y (cid:48) n ), etcetera; σ -structures and subsets of W n will be denotedsimilarly but without the (symbol for) “prime”.In the proofs that follow we will consider “restrictions” of P n to some subsets of W n according to the next definition. Definition 4.4. (i) If Y (cid:48) ⊆ W (cid:48) n then we define W Y (cid:48) = {A ∈ W n : A (cid:22) σ (cid:48) ∈ Y (cid:48) } and P Y (cid:48) ( A ) = P (cid:48) n ( A (cid:22) σ (cid:48) ) P (cid:48) n ( Y (cid:48) ) (cid:89) R ∈ σ \ σ (cid:48) ν R (cid:89) i =1 (cid:89) ¯ a ∈ χ R,i ( A ) λ ( A , R, i, ¯ a ) . (ii) If A (cid:48) ∈ W (cid:48) n , then we let W A (cid:48) = W {A (cid:48) } and, for every A ∈ W A (cid:48) , P A (cid:48) ( A ) = P {A (cid:48) } ( A ) = (cid:89) R ∈ σ \ σ (cid:48) ν R (cid:89) i =1 (cid:89) ¯ a ∈ χ R,i ( A ) λ ( A , R, i, ¯ a ) . Then P Y (cid:48) and P A (cid:48) are probability distributions on W Y (cid:48) and W A (cid:48) , respectively; if thisis not clear see Remark 4.7 below. Note also that if Y (cid:48) ⊆ W (cid:48) n , A (cid:48) ∈ Y (cid:48) and A ∈ W A (cid:48) ,then(4.1) P Y (cid:48) ( A ) = P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) ) P A (cid:48) ( A ) , ONDITIONAL PROBABILITY LOGIC 13 and in particular, taking Y (cid:48) = W (cid:48) n , we have, for every A ∈ W n ,(4.2) P n ( A ) = P (cid:48) n ( A (cid:22) σ (cid:48) ) P A (cid:22) σ (cid:48) ( A ) . We now state a few basic lemmas which will be useful.
Lemma 4.5.
For every n , if Y (cid:48) ⊆ W (cid:48) n then P n ( W Y (cid:48) ) = P (cid:48) n ( Y (cid:48) ) . Proof.
By using (4.2) in the first line below we get P n ( W Y (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) (cid:88) A∈ W A(cid:48) P n ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) (cid:88) A∈ W A(cid:48) P (cid:48) n ( A (cid:48) ) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) P (cid:48) n ( A (cid:48) ) (cid:88) A∈ W A(cid:48) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) P (cid:48) n ( A (cid:48) ) = P (cid:48) n ( Y (cid:48) ) . (cid:3) Lemma 4.6.
For every n ,(i) if X ⊆ W n and A (cid:48) ∈ W (cid:48) n , then P n ( X | W A (cid:48) ) = P A (cid:48) ( X ∩ W A (cid:48) ) , and(ii) if X ⊆ W n and Y (cid:48) ⊆ W (cid:48) n , then P n ( X | W Y (cid:48) ) = P Y (cid:48) ( X ∩ W Y (cid:48) ) . Proof.
Let X ⊆ W n .(i) Let A (cid:48) ∈ W (cid:48) n . Using Lemma 4.5 in the first line below and (4.2)) in the secondline below, we get P n ( X | W A (cid:48) ) = P n ( X ∩ W A (cid:48) ) P n ( W A (cid:48) ) = P n ( X ∩ W A (cid:48) ) P (cid:48) n ( A (cid:48) ) = P (cid:48) n ( A (cid:48) ) (cid:80) A∈ X ∩ W A(cid:48) P A (cid:48) ( A ) P (cid:48) n ( A (cid:48) ) = P A (cid:48) ( X ∩ W A (cid:48) ) . (ii) Let Y (cid:48) ⊆ W (cid:48) n . Using that X ∩ W Y (cid:48) is the disjoint union of all X ∩ W A (cid:48) such that A (cid:48) ∈ Y (cid:48) , Lemma 4.5, part (i) of this lemma and (4.1), we get P n ( X | W Y (cid:48) ) = P n ( X ∩ W Y (cid:48) ) P n ( W Y (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) P n ( X ∩ W A (cid:48) ) P n ( W Y (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) P n ( W A (cid:48) ) P n ( W Y (cid:48) ) P n ( X | W A (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) ) P A (cid:48) ( X ∩ W A (cid:48) ) = (cid:88) A (cid:48) ∈ Y (cid:48) P Y (cid:48) ( X ∩ W A (cid:48) ) = P Y (cid:48) ( X ∩ W Y (cid:48) ) . (cid:3) Remark 4.7. ( About P A (cid:48) ) Fix any n and any A (cid:48) ∈ W (cid:48) n . For every R ∈ σ \ σ (cid:48) ,every ≤ i ≤ ν R and every ¯ a ∈ χ R,i ( A (cid:48) ) , let Ω( R, i, ¯ a ) = { , } and let P R,i, ¯ a be theprobability distribution on Ω( R, i, ¯ a ) with P R,i, ¯ a (1) = µ ( R | χ R,i ) . Then let P Ω be theproduct measure on Ω = (cid:89) R ∈ σ \ σ (cid:48) ≤ i ≤ ν R ¯ a ∈ χ R,i ( A (cid:48) ) Ω( R, i, ¯ a ) . Consider the map which sends
A ∈ W A (cid:48) to the finite sequence ¯ κ A = (cid:0) κ ( R, i, ¯ a ) : R ∈ σ \ σ (cid:48) , ≤ i ≤ ν R , ¯ a ∈ χ R,i ( A (cid:48) ) (cid:1) where κ ( R, i, ¯ a ) = 1 if A | = R (¯ a ) and κ ( R, i, ¯ a ) = 0 otherwise. This map is clearly abijection from W A (cid:48) to Ω and, for every A ∈ W A (cid:48) , P A (cid:48) ( A ) = P Ω (¯ κ A ) . For every α ∈ { , } , every R ∈ σ \ σ (cid:48) and every ¯ a ∈ [ n ] (having the same length as thearity of R ), let E αR, ¯ a = {A ∈ W A (cid:48) : A | = R α (¯ a ) } . From the connection to the productmeasure it follows that(a) for every R ∈ σ \ σ (cid:48) , every ≤ i ≤ ν R and every ¯ a ∈ χ R,i ( A (cid:48) ) , P A (cid:48) ( E R, ¯ a ) = µ (cid:0) R | χ R,i (cid:1) , and(b) if α , . . . , α m ∈ { , } , R , . . . , R m ∈ σ \ σ (cid:48) and ¯ a , . . . , ¯ a m are tuples where | ¯ a i | is the arity of R i for each i , and for all ≤ i < j ≤ m , R i (cid:54) = R j or ¯ a i (cid:54) = ¯ a j , thenthe events E α R , ¯ a , . . . , E α m R m , ¯ a m are independent.The next lemma is a direct consequence of (b) of Remark 4.7. Lemma 4.8.
Suppose that p ( x , . . . , x m ) and q ( x , . . . , x m ) are (possibly partial) atomic ( σ \ σ (cid:48) ) -types. Also assume that if ϕ is an atomic σ -formula which does not have the form x = x or the form (cid:62) and ϕ ∈ p or ¬ ϕ ∈ p , then neither ϕ nor ¬ ϕ belongs to q . Then,for every n , every A (cid:48) ∈ W (cid:48) n and all distinct a , . . . , a m ∈ [ n ] , the event {A ∈ W A (cid:48) : A | = p ( a , . . . , a m ) } is independent from the event {A ∈ W A (cid:48) : A | = q ( a , . . . , a m ) } inthe probability space ( W A (cid:48) n , P A (cid:48) ) . Definition 4.9. (Saturation and unsaturation)
Let ¯ x and ¯ y be sequences of differentvariables such that rng(¯ x ) ∩ rng(¯ y ) = ∅ and let p (¯ x, ¯ y ) and q (¯ x ) be atomic σ -types suchthat q ⊆ p . Let also ≤ α ≤ and d = dim ¯ y ( p ) .(a) A finite σ -structure A is called ( p, q, α ) -saturated if, whenever ¯ a ∈ A | ¯ x | and A | = q (¯ a ) , then (cid:12)(cid:12) { ¯ b ∈ A | ¯ y | : A | = p (¯ a, ¯ b ) } (cid:12)(cid:12) ≥ α | A | d .(b) A finite σ -structure A is called ( p, q, α ) -unsaturated if, whenever ¯ a ∈ A | ¯ x | and A | = q (¯ a ) , then (cid:12)(cid:12) { ¯ b ∈ A | ¯ y | : A | = p (¯ a, ¯ b ) } (cid:12)(cid:12) ≤ α | A | d .If p (cid:48) (¯ x, ¯ y ) and q (cid:48) (¯ x ) are atomic σ (cid:48) -types and q (cid:48) ⊆ p (cid:48) , then the notions of ( p (cid:48) , q (cid:48) , α ) -saturated and ( p (cid:48) , q (cid:48) , α ) -unsaturated are defined in the same way, but considering finite σ (cid:48) -structures instead. Assumption 4.10. (Induction hypothesis)
Suppose that k ∈ N + , ε (cid:48) > , δ (cid:48) : N + → R ≥ and Y (cid:48) n ⊆ W (cid:48) n , for n ∈ N + , are such that the following hold:(1) lim n →∞ δ (cid:48) ( n ) = 0 .(2) P (cid:48) n ( Y (cid:48) n ) ≥ − δ (cid:48) ( n ) for all sufficiently large n .(3) For every complete atomic σ (cid:48) -type p (cid:48) (¯ x ) with | ¯ x | ≤ k there is a number whichwe denote P (cid:48) ( p (cid:48) (¯ x )) , or just P (cid:48) ( p (cid:48) ) , such that for all sufficiently large n and all ¯ a ∈ [ n ] which realize the identity fragment of p (cid:48) , (cid:12)(cid:12) P (cid:48) n (cid:0) {A (cid:48) ∈ W (cid:48) n : A (cid:48) | = p (cid:48) (¯ a ) } (cid:1) − P (cid:48) ( p (cid:48) (¯ x )) (cid:12)(cid:12) ≤ δ (cid:48) ( n ) . (4) For every complete atomic σ (cid:48) -type p (cid:48) (¯ x, ¯ y ) with | ¯ x ¯ y | ≤ k and < dim ¯ y ( p (cid:48) (¯ x, y )) = | ¯ y | , if q (cid:48) (¯ x ) = p (cid:48) (cid:22) ¯ x and P (cid:48) ( q (cid:48) ) > , then for all sufficiently large n , every A (cid:48) ∈ Y (cid:48) n is ( p (cid:48) , q (cid:48) , α/ (1 + ε (cid:48) )) -saturated and ( p (cid:48) , q (cid:48) , α (1 + ε (cid:48) )) -unsaturated if α = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) / P (cid:48) ( q (cid:48) (¯ x )) .(5) For every χ R,i (¯ x ) as in Assumption 4.1 there is a quantifier-free σ (cid:48) -formula χ ∗ R,i (¯ x ) such that for all sufficiently large n and all A (cid:48) ∈ Y (cid:48) n , A (cid:48) | = ∀ ¯ x (cid:0) χ R,i (¯ x ) ↔ χ ∗ R,i (¯ x ) (cid:1) . Remark 4.11. (Some special cases) (i) As a technical convenience we allow emptytypes (and this does not contradict our definition of an atomic type). For example, inDefinition 4.9, we allow the possibility that ¯ x is an empty sequence and consequently q (¯ x ) = ∅ and p (¯ x, ¯ y ) is really just p (¯ y ) .(ii) For an empty atomic σ (cid:48) -type p (cid:48) we let P (cid:48) ( p (cid:48) ) = 1 and in this case we also interpretthe set {A (cid:48) ∈ W (cid:48) n : A (cid:48) | = p (cid:48) (¯ a ) } as being equal to W (cid:48) n . Then part (3) of Assumption 4.10makes sense also for a empty type p (cid:48) . ONDITIONAL PROBABILITY LOGIC 15 (iii) If p (cid:48) (¯ y ) is a complete atomic σ (cid:48) -type P (cid:48) ( p (cid:48) ) = 0 , then for all sufficiently large n andall A (cid:48) ∈ Y (cid:48) , p (cid:48) is not realized in A (cid:48) (i.e. p (cid:48) ( A (cid:48) ) = ∅ ). The reason is this: Let ¯ x denote anemtpy sequence and let q (cid:48) (¯ x ) be the empty atomic σ (cid:48) -type, so q ⊆ p . For large enough n , every A (cid:48) ∈ W (cid:48) n is ( p (cid:48) , q (cid:48) , P (cid:48) ( p (cid:48) )(1 + ε (cid:48) )) -unsaturated by part (4) of Assumption 4.10.If P (cid:48) ( p (cid:48) ) = 0 this implies that p (cid:48) has no realization in A . Lemma 4.12.
Suppose that p (cid:48) (¯ x ) is a complete atomic σ (cid:48) -type and that p (¯ x ) ⊇ p (cid:48) (¯ x ) isa (possibly partial) atomic σ -type. There is a number which we denote P ( p (¯ x ) | p (cid:48) (¯ x )) ,or just P ( p | p (cid:48) ) , such that for all sufficiently large n , all ¯ a ∈ [ n ] and all A (cid:48) ∈ Y (cid:48) n suchthat A (cid:48) | = p (cid:48) (¯ a ) , P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p (¯ a ) } (cid:1) = P ( p (¯ x | p (cid:48) (¯ x )) . Moreover, the number P ( p (¯ x ) | p (cid:48) (¯ x )) is a product of numbers of the form µ ( R | χ R,i ) or − µ ( R | χ R,i ) . Proof.
Suppose that ¯ a, ¯ b ∈ [ n ] and A (cid:48) , B (cid:48) ∈ Y (cid:48) n are such that A (cid:48) | = p (cid:48) (¯ a ) and B (cid:48) | = p (cid:48) (¯ b ) .Let R ∈ σ \ σ (cid:48) . By part (5) of Assumption 4.10, for each ≤ i ≤ ν R , there is aquantifier free formula χ ∗ R,i such that (if n is large enough) χ R,i is equivalent to χ ∗ R,i in every structure in Y (cid:48) n . It follows that if ¯ c (cid:48) and ¯ d (cid:48) are subsequences of ¯ a and ¯ b ,respectively, of length equal to the arity of R , then either A (cid:48) | = χ R,i (¯ c ) and B (cid:48) | = χ R,i ( ¯ d ) ,or A (cid:48) (cid:54)| = χ R,i (¯ c ) and B (cid:48) (cid:54)| = χ R,i ( ¯ d ) . The conclusion of the lemma now follows from (a)and (b) of Remark 4.7. (cid:3) Lemma 4.13. (The base case)
For every k ∈ N + and every ε (cid:48) > , if σ (cid:48) = ∅ , P (cid:48) n is the uniform probability distribution on W (cid:48) n for all n and δ (cid:48) : N + → R ≥ is anyfunction such that lim n →∞ δ (cid:48) ( n ) = 0 , then there are Y (cid:48) n ⊆ W (cid:48) n , for n ∈ N + , such that(1)–(4) in Assumption 4.10 hold. Moreover, for every ε (cid:48) -noncritical ϕ (¯ x ) ∈ CP L ( ∅ ) with | ¯ x | + qr( ϕ ) ≤ k there is a quantifier-free formula ϕ ∗ (¯ x ) such that for all sufficientlylarge n and all A (cid:48) ∈ Y (cid:48) n , A (cid:48) | = ∀ ¯ x (cid:0) ϕ (¯ x ) ↔ ϕ ∗ (¯ x ) (cid:1) . Proof.
Suppose that σ (cid:48) = ∅ and let k ∈ N + and ε (cid:48) > be given. Then, for every n , W (cid:48) n contains a unique structure which is just the set [ n ] which has probability 1. Let δ (cid:48) : N + → R ≥ be any function such that lim n →∞ δ (cid:48) ( n ) = 0 . For every complete atomic σ (cid:48) -type p (cid:48) (¯ x ) let P (cid:48) ( p (cid:48) (¯ x )) = 1 . Observe that, for every n , if ¯ a ∈ [ n ] and ¯ a realizes theidentity fragment of p (cid:48) (¯ x ) , then ¯ a realizes p (cid:48) (¯ x ) in the unique A (cid:48) of W (cid:48) n . Hence, fortrivial reasons we have (3).For every n let Y (cid:48) n be the set of all A (cid:48) ∈ W (cid:48) n such that for every complete atomic σ (cid:48) -type p (cid:48) (¯ x, ¯ y ) with | ¯ x ¯ y | ≤ k and < dim ¯ y ( p (cid:48) (¯ x, y )) = | ¯ y | , if q (¯ x ) = p (cid:22) ¯ x , then for allsufficiently large n , every A (cid:48) ∈ Y (cid:48) n is ( p (cid:48) , q (cid:48) , / (1 + ε (cid:48) )) -saturated and ( p (cid:48) , q (cid:48) , (1 + ε (cid:48) )) -unsaturated. Suppose that p (cid:48) (¯ x, ¯ y ) is a complete atomic σ (cid:48) -type with | ¯ x ¯ y | ≤ k and < dim ¯ y ( p (cid:48) (¯ x, y )) = | ¯ y | . Let q (cid:48) (¯ x ) = p (cid:48) (¯ x, ¯ y ) and suppose that A (cid:48) | = q (cid:48) (¯ a ) where A (cid:48) ∈ W (cid:48) n . Then A (cid:48) | = p (cid:48) (¯ a, ¯ b ) for every ¯ b ∈ [ n ] consisting of different elements noone of which occurs in ¯ a . There are n | ¯ y | − Cn | ¯ y |− such ¯ b for some constant C . So if n | ¯ y | − Cn | ¯ y |− ≥ n | ¯ y | ε (cid:48) then A (cid:48) is ( p (cid:48) , q (cid:48) , / (1 + ε (cid:48) )) -saturated. For trivial reasons, A (cid:48) is also ( p (cid:48) , q (cid:48) , (1 + ε (cid:48) )) -unsaturated. Hence, we have proved (4). The last claim of the lemmafollows from Proposition 4.32 the proof of which works out in exactly the same way if σ and Y n (in that proof) is replaced by σ (cid:48) and Y (cid:48) n , respectively, and we assume (4). Inother words, the almost everywhere elimination of quantifiers follows from the saturationand unsaturation properties stated in (4). (cid:3) In fact the uniform probability distribution is the only probability distribution on W (cid:48) n since W (cid:48) n isa singleton if σ (cid:48) = ∅ (which we assume in this lemma). In the sense of Definition 4.30.
Lemma 4.14.
Suppose that X n ⊆ W n . Then for all sufficiently large n , P n ( X n ) ≤ P n ( X n ∩ W Y (cid:48) n ) + δ (cid:48) ( n ) . Proof.
We have P n ( X n ) = P n ( X n ∩ W Y (cid:48) n ) + P n ( X n \ W Y (cid:48) n ) and, using Lemma 4.5, we have P n ( X n \ W A (cid:48) ) ≤ P n ( W n \ W Y (cid:48) n ) = 1 − P n ( W Y (cid:48) n ) = 1 − P (cid:48) n ( Y (cid:48) n ) ≤ δ (cid:48) ( n ) . Hence P n ( X n ) ≤ P n ( X n ∩ W Y (cid:48) n ) + δ (cid:48) ( n ) . (cid:3) Lemma 4.15.
Suppose that p (cid:48) (¯ x ) is a complete atomic σ (cid:48) -type and that p (¯ x ) ⊇ p (cid:48) (¯ x ) isan (possibly partial) atomic σ -type. Letting n be sufficiently large, then for all ¯ a ∈ [ n ] and letting Z (cid:48) n be the set of all A (cid:48) ∈ Y (cid:48) n such that A (cid:48) | = p (cid:48) (¯ a ) we have P n (cid:0) {A ∈ W n : A | = p (¯ a ) } | W Y (cid:48) n ∩ {A ∈ W n : A | = p (cid:48) (¯ a ) } (cid:1) = P Z (cid:48) n (cid:0) {A ∈ W Z (cid:48) n : A | = p (¯ a ) } (cid:1) = P ( p (¯ x ) | p (cid:48) (¯ x )) where P ( p (¯ x ) | p (cid:48) (¯ x )) is like in Lemma 4.12. Proof.
For every
A ∈ W n we have A | = p (cid:48) (¯ a ) if and only if A (cid:22) σ (cid:48) | = p (cid:48) (¯ a ) . Therefore W Y (cid:48) n ∩ {A ∈ W n : A | = p (cid:48) (¯ a ) } = W Z (cid:48) n . By Lemma 4.6 we have P n (cid:0) {A ∈ W n : A | = p (¯ a ) } | W Y (cid:48) n ∩ {A ∈ W n : A | = p (cid:48) (¯ a ) } (cid:1) = P Z (cid:48) n (cid:0) {A ∈ W Z (cid:48) n : A | = p (¯ a ) } (cid:1) . Then, using (4.1) and Lemma 4.12, we get P Z (cid:48) n (cid:0) {A ∈ W Z (cid:48) n : A | = p (¯ a ) } (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P Z (cid:48) n (cid:0) A ∈ W A (cid:48) : A | = p (¯ a ) (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Z (cid:48) n ) P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p (¯ a ) } (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Z (cid:48) n ) P ( p (¯ x | p (cid:48) (¯ x )) = P ( p (¯ x | p (cid:48) (¯ x )) . (cid:3) Lemma 4.16.
Suppose that p (cid:48) (¯ x ) is a complete atomic σ (cid:48) -type and that p (¯ x ) ⊇ p (cid:48) (¯ x ) isa (possibly partial) atomic σ -type. Then for all sufficiently large n and all ¯ a ∈ [ n ] whichrealize the identity fragment of p (cid:48) (¯ x ) (and hence of p ) we have (cid:12)(cid:12) P n (cid:0) {A ∈ W n : A | = p (¯ a ) } | W Y (cid:48) n (cid:1) − P ( p (¯ x | p (cid:48) (¯ x )) · P (cid:48) ( p (cid:48) (¯ x )) (cid:12)(cid:12) < δ (cid:48) ( n ) . Proof.
Let ¯ a ∈ [ n ] realize the identity fragment of p (cid:48) (¯ x ) . Furthermore,let X n be the set of all A ∈ W n such that A | = p (¯ a ) ,let X (cid:48) n be the set of all A (cid:48) ∈ W (cid:48) n such that A (cid:48) | = p (cid:48) (¯ a ) , andlet Z (cid:48) n be the set of all A (cid:48) ∈ Y (cid:48) n such that A (cid:48) | = p (cid:48) (¯ a ) .From parts (2) and (3) of Assumption 4.10 it easily follows that (for large enough n ) P (cid:48) n ( Z (cid:48) n ) / P (cid:48) n ( Y (cid:48) n ) differs from P (cid:48) n ( Z (cid:48) n ) by at most δ (cid:48) ( n ) , P (cid:48) n ( Z (cid:48) n ) differs from P (cid:48) n ( X (cid:48) n ) by at most δ (cid:48) ( n ) and P (cid:48) n ( X (cid:48) n ) differs from P (cid:48) ( p (cid:48) (¯ x )) by at most δ (cid:48) ( n ) . ONDITIONAL PROBABILITY LOGIC 17
By Lemma 4.6, P n ( X n | W Y (cid:48) n ) = P Y (cid:48) n ( X ∩ W Y (cid:48) n ) . Then, using (4.1) and Lemma 4.12,we have P Y (cid:48) n (cid:0) X ∩ W Y (cid:48) n (cid:1) = (cid:88) A (cid:48) ∈ Y (cid:48) n P Y (cid:48) n (cid:0) X n ∩ W A (cid:48) (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P Y (cid:48) n (cid:0) X n ∩ W A (cid:48) (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n (cid:88) A∈ X n ∩ W A(cid:48) P Y (cid:48) n ( A ) = (cid:88) A (cid:48) ∈ Z (cid:48) n (cid:88) A∈ X n ∩ W A(cid:48) P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) (cid:88) A∈ X n ∩ W A(cid:48) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P A (cid:48) (cid:0) X n ∩ W A (cid:48) (cid:1) = (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P ( p (¯ x ) | p (cid:48) (¯ x )) = P ( p (¯ x ) | p (cid:48) (¯ x )) (cid:88) A (cid:48) ∈ Z (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) = P ( p (¯ x ) | p (cid:48) (¯ x )) P (cid:48) n ( Z (cid:48) n ) P (cid:48) n ( Y (cid:48) n ) , where P (cid:48) ( p (cid:48) (¯ x )) − δ (cid:48) ( n ) ≤ P (cid:48) n ( Z (cid:48) n ) P (cid:48) n ( Y (cid:48) n ) ≤ P (cid:48) ( p (cid:48) (¯ x )) + 3 δ (cid:48) ( n ) . (cid:3) Lemma 4.17.
Suppose that p (cid:48) (¯ x ) is a complete atomic σ (cid:48) -type and that p (¯ x ) ⊇ p (cid:48) (¯ x ) isan (possibly partial) atomic σ -type. Then for all sufficiently large n and all ¯ a ∈ [ n ] whichrealize the identity fragment of p (cid:48) (¯ x ) we have (cid:12)(cid:12) P n (cid:0) {A ∈ W n : A | = p (¯ a ) } (cid:1) − P ( p (¯ x | p (cid:48) (¯ x )) · P (cid:48) ( p (cid:48) (¯ x )) (cid:12)(cid:12) < δ (cid:48) ( n ) . Proof.
Let ¯ a ∈ [ n ] realize the identity fragment of p (cid:48) (¯ x ) . Let X n be the set of all A ∈ W n such that A | = p (¯ a ) . We have P n (cid:0) X n (cid:1) = P n (cid:0) X n | W Y (cid:48) n (cid:1) P n (cid:0) W Y (cid:48) n (cid:1) + P n (cid:0) X | W n \ W Y (cid:48) n (cid:1) P n (cid:0) W n \ W Y (cid:48) n (cid:1) . By the use of Lemma 4.5 and by part (2) of Assumption 4.10, we also have P n (cid:0) W n \ W Y (cid:48) n (cid:1) = 1 − P n (cid:0) W Y (cid:48) n (cid:1) = 1 − P (cid:48) n ( Y (cid:48) n ) ≤ δ (cid:48) ( n ) . It follows that P n (cid:0) X | W n \ W Y (cid:48) n (cid:1) P n (cid:0) W n \ W Y (cid:48) n (cid:1) ≤ δ (cid:48) ( n ) . By Lemma 4.5 and part (2)of Assumption 4.10, P n (cid:0) W Y (cid:48) n (cid:1) = P (cid:48) n (cid:0) Y (cid:48) n (cid:1) ≥ − δ (cid:48) ( n ) . It now follows from Lemma 4.16that P n (cid:0) X n (cid:1) differs from P ( p (¯ x | p (cid:48) (¯ x )) · P (cid:48) ( p (cid:48) (¯ x )) by at most δ (cid:48) ( n ) (for sufficiently large n ). (cid:3) Definition 4.18.
For every (possibly partial) σ -type p (¯ x ) such that p (cid:48) (¯ x ) = p (cid:22) σ (cid:48) is acomplete atomic σ (cid:48) -type, we define P ( p (¯ x )) = P (cid:48) ( p (cid:48) (¯ x )) · P ( p (¯ x ) | p (cid:48) (¯ x )) . With this definition we can reformulate Lemma 4.17 as follows:
Corollary 4.19. If p (¯ x ) is an (possibly partial) atomic σ -type such that p (cid:22) σ (cid:48) is a completeatomic σ (cid:48) -type, then, for all sufficiently large n and all ¯ a ∈ [ n ] which realize the identityfragment of p (¯ x ) we have (cid:12)(cid:12) P n (cid:0) {A ∈ W n : A | = p (¯ a ) } (cid:1) − P ( p (¯ x )) (cid:12)(cid:12) < δ (cid:48) ( n ) . Lemma 4.20.
Suppose that p (¯ x, ¯ y ) is a complete atomic σ -type. Let p (cid:48) (¯ x, ¯ y ) = p (cid:22) σ (cid:48) , q (¯ x ) = p (cid:22) ¯ x and let p ¯ y (¯ x, ¯ y ) include p (cid:48) (¯ x, ¯ y ) and all formulas in p in which at least onevariable from ¯ y occurs. Then P ( p (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) = P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) · P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) . Proof.
By Lemma 4.12, for any sufficently large n , any ¯ a, ¯ b ∈ [ n ] and any A (cid:48) ∈ Y (cid:48) n suchthat A (cid:48) | = p (cid:48) (¯ a, ¯ b ) , we have P ( p (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p (¯ a, ¯ b ) } (cid:1) , P ( p y (¯ x, y ) | p (cid:48) (¯ x, ¯ y )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p y (¯ a, ¯ b ) } (cid:1) and P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = q (¯ a ) } (cid:1) . Note that p (¯ x, ¯ y ) = p (cid:48) (¯ x, ¯ y ) ∪ p ¯ y (¯ x, ¯ y ) ∪ q (¯ x ) . By Lemma 4.8, the event {A ∈ W A (cid:48) : A | = p ¯ y (¯ a, ¯ b ) } is independent, in ( W A (cid:48) , P A (cid:48) ) , from the event {A ∈ W A (cid:48) : A | = q (¯ a ) } .Therefore, P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p (¯ a, ¯ b ) } (cid:1) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = p ¯ y (¯ a, ¯ b ) } (cid:1) · P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = q (¯ a ) } (cid:1) and from this the lemma follows. (cid:3) Lemma 4.21.
Let p (cid:48) (¯ x, ¯ y ) be a complete atomic σ (cid:48) -type, q (cid:48) (¯ x ) = p (cid:48) (cid:22) ¯ x and supposethat q (¯ x ) is a complete atomic σ -type such that q ⊇ q (cid:48) . Then P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) = P ( q (¯ x ) | q (cid:48) (¯ x )) . Proof.
Since q (cid:48) (¯ x ) ⊆ p (cid:48) (¯ x, ¯ y ) it follows from Lemma 4.12 that for any sufficently large n , any ¯ a, ¯ b ∈ [ n ] and any A (cid:48) ∈ Y (cid:48) n such that A (cid:48) | = p (cid:48) (¯ a, ¯ b ) , we have P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = q (¯ a ) } (cid:1) and P ( q (¯ x ) | q (cid:48) (¯ x )) = P A (cid:48) (cid:0) {A ∈ W A (cid:48) : A | = q (¯ a ) } (cid:1) . Hence P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) = P ( q (¯ x ) | q (cid:48) (¯ x )) . (cid:3) In Lemma 4.12 we defined the notation P ( p (¯ x ) | p (cid:48) (¯ x )) when the atomic σ -type p has nomore variables than the complete atomic σ (cid:48) -type p (cid:48) . From Definition 4.18 of P ( p (¯ x )) itfollows that P ( p (¯ x ) | p (cid:48) (¯ x )) = P ( p (¯ x )) / P (cid:48) ( p (cid:48) (¯ x )) . Now we extend this notation to pairsof ( p (¯ x, ¯ y ) , q (¯ x )) where p (¯ x, ¯ y ) is a complete atomic σ -type and q (¯ x ) = p (cid:22) ¯ x . Definition 4.22.
Suppose that p (¯ x, y ) is a complete atomic σ -type and let q (¯ x ) = p (cid:22) ¯ x .We define P ( p (¯ x, ¯ y ) | q (¯ x )) = P ( p (¯ x, ¯ y )) P ( q (¯ x )) . In the same way, if p (cid:48) (¯ x, ¯ y ) is a complete atomic σ (cid:48) -type and q (cid:48) (¯ x ) = p (cid:48) (cid:22) ¯ x , then we define P (cid:48) ( p (cid:48) (¯ x, ¯ y ) | q (cid:48) (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) P (cid:48) ( q (cid:48) (¯ x )) . Lemma 4.23.
Suppose that p (¯ x, ¯ y ) is a complete atomic σ -type, let q (¯ x ) = p (cid:22) ¯ x andlet p ¯ y (¯ x, ¯ y ) be defined as in Lemma 4.20. Then P ( p (¯ x, ¯ y ) | q (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y ) | q (cid:48) (¯ x )) · P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) . Proof.
Using Definition 4.18 and Lemmas 4.20 and 4.21 we get P ( p (¯ x, ¯ y ) | q (¯ x )) = P ( p (¯ x, ¯ y )) P ( q (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) · P ( p (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) P (cid:48) ( q (cid:48) (¯ x )) · P ( q (¯ x ) | q (cid:48) (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) · P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) · P ( q (¯ x ) | p (cid:48) (¯ x, ¯ y )) P (cid:48) ( q (cid:48) (¯ x )) · P ( q (¯ x ) | q (cid:48) (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y )) · P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) · P ( q (¯ x ) | q (cid:48) (¯ x )) P (cid:48) ( q (cid:48) (¯ x )) · P ( q (¯ x ) | q (cid:48) (¯ x )) = P (cid:48) ( p (cid:48) (¯ x, ¯ y ) | q (cid:48) (¯ x )) · P ( p ¯ y (¯ x, ¯ y ) | p (cid:48) (¯ x, ¯ y )) . (cid:3) ONDITIONAL PROBABILITY LOGIC 19
Lemma 4.24.
Suppose that n is large enough that part (4) of Assumption 4.10 holds.Suppose that p (¯ x, y ) and q (¯ x ) are complete atomic σ -types such that | ¯ xy | ≤ k , dim y ( p ) = 1 and q ⊆ p . Let γ = P ( p (¯ x, y ) | q (¯ x )) and A (cid:48) ∈ Y (cid:48) n . Then P A (cid:48) n (cid:0) {A ∈ W A (cid:48) : A is ( p, q, γ/ (1 + ε (cid:48) ) ) -saturatedand ( p, q, γ (1 + ε (cid:48) ) ) -unsaturated } (cid:1) is at least − n | ¯ x | e − c ε (cid:48) γn where the constant c ε (cid:48) > depends only on ε (cid:48) . Proof.
Suppose that p (¯ x, y ) and q (¯ x ) are complete atomic σ -types such that | ¯ xy | ≤ k , dim y ( p ) = 1 and q ⊆ p . Let p (cid:48) = p (cid:22) σ and q (cid:48) = q (cid:22) σ (cid:48) . Moreover, let p y (¯ x, y ) include p (cid:48) (¯ x, y ) and all ( σ \ σ (cid:48) ) -formulas in p (¯ x, y ) which contain the variable y . Also, let α = P (cid:48) ( p (cid:48) (¯ x, y ) | q (cid:48) (¯ x )) ,β = P (cid:48) ( p y (¯ x, y ) | p (cid:48) (¯ x, y )) and γ = P (cid:48) ( p (¯ x, y ) | q (¯ x )) . By Lemma 4.23 we have γ = αβ .Let A (cid:48) ∈ Y (cid:48) n . By (4) of Assumption 4.10 A (cid:48) is ( p (cid:48) , q (cid:48) , α/ (1 + ε (cid:48) )) -saturated and ( p (cid:48) , q (cid:48) , α (1 + α )) -unsaturated if n is large enough. For every ¯ a ∈ [ n ] | ¯ x | let B (cid:48) ¯ a = (cid:8) b ∈ [ n ] : A (cid:48) | = p (cid:48) (¯ a, b ) (cid:9) . By the mentioned (un)saturation property, if A (cid:48) | = q (cid:48) (¯ a ) then αn/ (1 + ε (cid:48) ) ≤ | B (cid:48) ¯ a | ≤ αn (1 + ε (cid:48) ) . For every ¯ a ∈ [ n ] | ¯ x | and every A ∈ W A (cid:48) let B ¯ a, A = (cid:8) b ∈ [ n ] : A | = p y (¯ a, b ) (cid:9) and note that B ¯ a, A ⊆ B (cid:48) ¯ a for every ¯ a and every A ∈ W A (cid:48) . Let X ¯ a = (cid:8) A ∈ W A (cid:48) : either A (cid:54)| = q (¯ a ) or γ/ (1 + ε (cid:48) ) ≤ | B ¯ a, A | ≤ γ (1 + ε (cid:48) ) (cid:9) . Observe that if
A ∈ W A (cid:48) , A | = q (¯ a ) and A | = p y (¯ a, b ) , then A | = p (¯ a, b ) . Hence every A ∈ (cid:84) ¯ a ∈ [ n ] | ¯ x | X ¯ a is ( p, q, γ/ (1 + ε (cid:48) ) ) -saturated and ( p, q, γ (1 + ε (cid:48) ) ) -unsaturated.Fix any ¯ a such that A (cid:48) | = q (cid:48) (¯ a ) (and note that A | = q (¯ a ) implies A (cid:48) | = q (cid:48) (¯ a ) ). ByLemma 4.8, for all distinct b, c ∈ B (cid:48) ¯ a , the events E b = {A ∈ W A (cid:48) : A | = p y (¯ a, b ) } and E c = {A ∈ W A (cid:48) : A | = p y (¯ a, c ) } are independent. Moreover, by Lemma 4.12, for each b ∈ B (cid:48) ¯ a , P (cid:48) n ( E b ) = β . Let Z : W A (cid:48) → N be the random variable defined by Z ( A ) = (cid:12)(cid:12) { b ∈ B (cid:48) ¯ a : A | = p y (¯ a, b ) } (cid:12)(cid:12) . Let ε = ε (cid:48) / (1 + ε (cid:48) ) and note that ε < ε (cid:48) and − ε = 1 / (1 + ε (cid:48) ) . By Lemma 2.5, P A (cid:48) (cid:0)(cid:12)(cid:12) Z − β | B (cid:48) ¯ a | (cid:12)(cid:12) > εβ | B (cid:48) ¯ a | (cid:1) < (cid:0) − c ε β | B (cid:48) ¯ a | (cid:1) where c ε depends only on ε and hence only on ε (cid:48) . Recall that αβ = γ and αn/ (1 + ε (cid:48) ) ≤ | B (cid:48) ¯ a | ≤ αn (1 + ε (cid:48) ) . From this it follows that (1 + ε (cid:48) ) γn ≥ (1 + ε (cid:48) ) β | B (cid:48) ¯ a | and γn/ (1 + ε (cid:48) ) ≤ β | B (cid:48) ¯ a | / (1 + ε (cid:48) ) . Therefore, if Z > (1 + ε (cid:48) ) γn or Z < γn/ (1 + ε (cid:48) ) , then (cid:12)(cid:12) Z − β | B (cid:48) ¯ a | (cid:12)(cid:12) > εβ | B (cid:48) ¯ a | . Hence, if c ε (cid:48) = c ε / (1 + ε (cid:48) ) , P A (cid:48) (cid:0) W A (cid:48) \ X ¯ a (cid:1) < (cid:0) − c ε β | B (cid:48) ¯ a | (cid:1) ≤ (cid:0) − c ε (cid:48) γn (cid:1) . Since the argument works for all ¯ a ∈ [ n ] | ¯ x | such that A (cid:48) | = q (cid:48) (¯ a ) it follows that P A (cid:48) (cid:18) (cid:92) ¯ a ∈ [ n ] | ¯ x | X ¯ a (cid:19) ≥ − n | ¯ x | e − c ε (cid:48) γn and this proves the lemma. (cid:3) The next lemma generalizes the previous one to types p (¯ x, ¯ y ) where the length of ¯ y isgreater than one. Lemma 4.25.
Suppose that n is large enough that part (4) of Assumption 4.10 holds.Suppose that p (¯ x, ¯ y ) and q (¯ x ) are complete atomic σ -types such that | ¯ x ¯ y | ≤ k , dim ¯ y ( p ) = | ¯ y | and q ⊆ p . Let γ = P ( p (¯ x, ¯ y ) | q (¯ x )) and A (cid:48) ∈ Y (cid:48) n . Then P A (cid:48) n (cid:0) {A ∈ W A (cid:48) : A is ( p, q, γ/ (1 + ε (cid:48) ) | ¯ y | ) -saturatedand ( p, q, γ (1 + ε (cid:48) ) | ¯ y | ) -unsaturated } (cid:1) is at least − | ¯ y | n | ¯ x | + | ¯ y |− e − c ε (cid:48) γn where the constant c ε (cid:48) > depends only on ε (cid:48) . Proof.
We prove the lemma by induction on m = | ¯ y | . The base case m = 1 isgiven by Lemma 4.24. Let p (¯ x, ¯ y ) and q (¯ x ) be as assumed in the lemma where ¯ y =( y , . . . , y m +1 ) . Let p m (¯ x, y , . . . , y m ) be the restriction of p to formulas with variablesamong ¯ x, y , . . . , y m . Furthermore, let α = P ( p m | q ) , β = P ( p | p m ) and γ = P ( p | q ) .Observe that by Definition 4.22 we have γ = P ( p ) P ( q ) = P ( p ) P ( p m ) · P ( p m ) P ( q ) = βα. Let A (cid:48) ∈ Y (cid:48) n . By the induction hypothesis, the probability (with the distribution P A (cid:48) )that(a) A ∈ W A (cid:48) is ( p m , q, α/ (1 + ε (cid:48) ) m ) -saturated and ( p m , q, α (1 + ε (cid:48) ) m ) -unsaturatedis at least − m n | ¯ x | + m − e − c ε (cid:48) αn where the constant c ε (cid:48) depends only on ε (cid:48) . By theinduction hypothesis again, the probability that(b) A ∈ W A (cid:48) is ( p, p m , β/ (1 + ε (cid:48) ) ) -saturated and ( p, p m , β (1 + ε (cid:48) ) ) -unsaturatedis at least − n | ¯ x | + m e − c ε (cid:48) βn where c ε (cid:48) is the same constant as above (since it dependsonly on ε (cid:48) ). It is straightforward to check that if A ∈ W A (cid:48) satisfies both (a) and (b) then A is ( p, q, γ/ (1 + ε (cid:48) ) m +1) ) -saturated and ( p, q, γ (1 + ε (cid:48) ) m +1) ) -unsaturated. Since γ = αβ ≤ min { α, β } it follows that the probability that A ∈ W A (cid:48) is ( p, q, γ/ (1 + ε (cid:48) ) m +1) ) -saturated and ( p, q, γ (1 + ε (cid:48) ) m +1) ) -unsaturated is at least − m +1 n | ¯ x | + m e − c ε (cid:48) γn . (cid:3) Definition 4.26.
For every n , let Y n be the set of all A ∈ W Y (cid:48) n such that whenever p (¯ x, ¯ y ) and q (¯ x ) are complete atomic σ -types with | ¯ x ¯ y | ≤ k , dim ¯ y ( p ) = | ¯ y | , q ⊆ p and γ = P ( p | q ) , then A is ( p, q, γ/ (1 + ε (cid:48) ) | ¯ y | ) -saturated and ( p, q, γ (1 + ε (cid:48) ) | ¯ y | ) -unsaturated.The following corollary follows directly from the definition of Y n and Lemma 4.25. Corollary 4.27.
Let p (¯ x, ¯ y ) and q (¯ x ) are complete atomic σ -types such that | ¯ x ¯ y | ≤ k , d = dim ¯ y ( p ) > , q ⊆ p and γ = P ( p | q ) . For every n , every A ∈ Y n is ( p, q, γ/ (1 + ε (cid:48) ) d ) -saturated and ( p, q, γ (1 + ε (cid:48) ) d ) -unsaturated. Lemma 4.28.
There is a constant c > such that for all sufficiently large n , P n (cid:0) Y n (cid:1) ≥ (cid:0) − e − cn (cid:1)(cid:0) − δ (cid:48) ( n ) (cid:1) . Proof.
There are, up to changing variables, only finitely many atomic σ -types p (¯ x ) suchthat | ¯ x | ≤ k . It follows from Lemma 4.25 that there is a constant c > such that for alllarge enough n and all A (cid:48) ∈ Y (cid:48) n , P A (cid:48) n (cid:0) Y n ∩ W A (cid:48) (cid:1) ≥ − e − cn . Note that P n ( Y n ) = P n (cid:0) Y n | W Y (cid:48) n (cid:1) P n (cid:0) W Y (cid:48) n (cid:1) . By Lemma 4.5, P n ( W Y (cid:48) n ) = P (cid:48) n ( Y (cid:48) n ) and by Lemma 4.6 we have P n ( Y (cid:48) n | W Y (cid:48) n ) = P Y (cid:48) n ( Y n ∩ W Y (cid:48) n ) . Hence P n ( Y n ) = ONDITIONAL PROBABILITY LOGIC 21 P Y (cid:48) n ( Y n ∩ W Y (cid:48) n ) P (cid:48) n (cid:0) Y (cid:48) n (cid:1) . Then, reasoning similarly as in the proof of Lemma 4.16(using (4.1)), we get P Y (cid:48) n (cid:0) Y n ∩ W Y (cid:48) n (cid:1) = (cid:88) A (cid:48) ∈ Y (cid:48) n P Y (cid:48) n (cid:0) Y n ∩ W A (cid:48) (cid:1) = (cid:88) A (cid:48) ∈ Y (cid:48) n (cid:88) A∈ Y n ∩ W A(cid:48) P Y (cid:48) n ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) n (cid:88) A∈ Y n ∩ W A(cid:48) P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) (cid:88) A∈ Y n ∩ W A(cid:48) P A (cid:48) ( A ) = (cid:88) A (cid:48) ∈ Y (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) P A (cid:48) (cid:0) Y n ∩ W A (cid:48) (cid:1) ≥ (cid:88) A (cid:48) ∈ Y (cid:48) n P (cid:48) n ( A (cid:48) ) P (cid:48) n ( Y (cid:48) n ) (cid:0) − e − cn (cid:1) = (cid:0) − e − cn (cid:1) . Using part (2) of Assumption 4.10 we know get P n ( Y n ) = P Y (cid:48) n (cid:0) Y n ∩ W Y (cid:48) n (cid:1) P (cid:48) n ( Y (cid:48) n ) ≥ (cid:0) − e − cn (cid:1)(cid:0) − δ (cid:48) ( n ) (cid:1) . (cid:3) Definition 4.29.
Let m be a positive integer. A real number α is called m -critical if atleast one of the following holds:(a) There are a complete atomic σ -type q (¯ x ) , distinct complete atomic σ -types p (¯ x, ¯ y ) , . . . , p l (¯ x, ¯ y ) and a number ≤ l (cid:48) ≤ l such that | ¯ x ¯ y | ≤ m , q ⊆ p i for all ≤ i ≤ l and α = (cid:80) l (cid:48) i =1 P ( p i | q ) (cid:80) li =1 P ( p i | q ) . (b) α = l (cid:48) /l where ≤ l (cid:48) ≤ l are integers and l is, for any choice of distinct variables x , . . . , x m , less or equal to the number of pairs ( p ( x , . . . , x m (cid:48) ) , q ( x , . . . , x d )) where d < m (cid:48) ≤ m , p and q are complete atomic σ -types such that q ⊆ p and dim ( x d ,...,x m (cid:48) ) ( p ) = 0 .From the definition it follows that (for every m ∈ N ) there are only finitely many m -critical numbers. It also follows (from part (b)) that, for every m , and are m -critical. Definition 4.30.
Let ϕ (¯ x ) ∈ CP L ( σ ) and let l = | ¯ x | + qr ( ϕ ) .(i) We call ϕ (¯ x ) noncritical if the following holds:If (cid:16) r + (cid:107) ψ (¯ z, ¯ y ) | θ (¯ z, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ z, ¯ y ) | θ ∗ (¯ z, ¯ y ) (cid:107) ¯ y (cid:17) or (cid:16) (cid:107) ψ (¯ z, ¯ y ) | θ (¯ z, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ z, ¯ y ) | θ ∗ (¯ z, ¯ y ) (cid:107) ¯ y + r (cid:17) is a subformula of ϕ (¯ x ) (where ψ, θ, ψ ∗ and θ ∗ denote formulas in CP L ( σ ) and ¯ z and ¯ y may have variables in common with ¯ x ) then, for all l -critical numbers α and β , r (cid:54) = α − β .(ii) Let ε > . We say that ϕ (¯ x ) is ε -noncritical if • ϕ (¯ x ) is noncritical and • whenever r appears in a subformula as in part (i) and α and β are l -criticalnumbers, then the following implications hold:If r + α > β then r + α/ (1 + 2 ε ) > β (1 + 2 ε ) , andif α > β + r then α/ (1 + 2 ε ) > β (1 + 2 ε ) + r .Since, for every l ∈ N , there are only finitely many l -critical numbers it follows that forevery noncritical ϕ (¯ x ) ∈ CP L ( σ ) , if one just chooses ε > sufficiently small, then ϕ (¯ x ) is ε -noncritical. Definition 4.26 and Lemma 4.28 motivate the next definition. Definition 4.31.
Let ε > be such that ε = (1 + ε (cid:48) ) k .It follows from Definition 4.31 and Lemma 4.27 that if p (¯ x, ¯ y ) and q (¯ x ) are completeatomic σ -types such that | ¯ x ¯ y | ≤ k , d = dim ¯ y ( p ) > , q ⊆ p , P ( q ) > , and γ = P ( p | q ) , then for every n , every A ∈ Y n is ( p, q, γ/ (1 + ε )) -saturated and ( p, q, γ (1 + ε )) -unsaturated. By an analogous argument as in Remark 4.11 (iii), it now follows that if p (¯ x ) is a complete atomic σ -type such that | ¯ x | ≤ k and P ( p ) = 0 , then for all sufficientlylarge n , p is not realized in any member of Y n .In the proof of the proposition below we will sometimes abuse notation by treatingan atomic type p (¯ x ) as the formula obtained by taking the conjunction of all formulasin p (¯ x ) . So when writing, for example, ‘ (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, y ) ’ in the proof below we view p i,j (¯ x, y ) in this expression as the conjunction of all formulas in the complete atomic type p i,j (¯ x, y ) . Proposition 4.32. (Elimination of quantifiers)
Suppose that ϕ (¯ x ) ∈ CP L ( σ ) is ε -noncritical and | ¯ x | + qr ( ϕ ) ≤ k . Then there is a quantifier-free formula ϕ ∗ (¯ x ) such thatfor all sufficiently large n and every A ∈ Y n , A | = ∀ ¯ x ( ϕ (¯ x ) ↔ ϕ ∗ (¯ x )) . Proof.
Let an ε -noncritical ϕ (¯ x ) ∈ CP L ( σ ) be given with | ¯ x | + qr ( ϕ ) ≤ k . We willassume that ¯ x is nonempty (i.e. that ϕ has free variables). In Remark 4.39 it is indicatedwhich changes we need to make in the simpler case when ϕ has no free variable. Theproof proceeds by induction on quantifier-rank. Suppose that qr ( ϕ ) > since otherwisewe can just let ϕ ∗ be ϕ and then we are done. If for all sufficiently large n , for all A ∈ Y n and for all ¯ a ∈ [ n ] | ¯ x | we have A (cid:54)| = ϕ (¯ a ) then we can let ϕ ∗ (¯ x ) be the formula x (cid:54) = x andthen A | = ∀ ¯ x ( ϕ (¯ x ) ↔ ϕ ∗ (¯ x )) for all sufficiently large n and all A ∈ Y n . So from now onwe assume that, for arbitrarily large n , there are A ∈ Y n and ¯ a such that A | = ϕ (¯ a ) .Suppose that ϕ (¯ x ) is ∃ yψ (¯ x, y ) for some ψ (¯ x, y ) . Then we have | ¯ xy | + qr ( ψ ) ≤ k andqr ( ψ ) < qr ( ϕ ) so, by the induction hypothesis, we may assume that ψ (¯ x, y ) is quantifier-free. By assumption there are n , A ∈ Y n , ¯ a and b such that A | = ψ (¯ a, b ) . Then thereare m ≥ , different complete atomic σ -types q i (¯ x ) , i = 1 , . . . , m , and, for each i , m i ≥ and different complete atomic σ -types p i,j (¯ x, y ) , j = 1 , . . . , m i , such that q i ⊆ p i,j for all j and ψ (¯ x, y ) is equivalent to (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, y ) . If, for some i , P ( q i (¯ x )) = 0 , then q i isnot realized in any A ∈ Y n (for large enough n ) and can be removed. So we may assumethat all P ( q i ) > for all i . If, for some i and j , P ( p i,j | q ) = 0 then P ( p i,j ) = 0 so p i,j is notrealized in any A ∈ Y n for large enough n . So we may also assume that P ( p i,j | q i ) > for all i and j . If dim y ( p i,j ) = 1 then, by the definitions of Y n and ε , it follows that for allsufficiently large n and all A ∈ Y n , if A | = q i (¯ a ) then A | = ∃ yp i,j (¯ a, y ) . If dim y ( p i,j ) = 0 then, for all n and all A ∈ W n , if A | = q i (¯ a ) then A | = p i,j (¯ a, b ) for some b ∈ rng(¯ a ) . Itfollows that for all sufficiently large n and all A ∈ Y n , A | = ∀ ¯ x (cid:0) ∃ yψ (¯ x, y ) ↔ (cid:87) mi =1 q i (¯ x ) (cid:1) .Now we consider the case when ϕ (¯ x ) has the form (cid:16) r + (cid:107) ψ (¯ x, ¯ y ) | θ (¯ x, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ x, ¯ y ) | θ ∗ (¯ x, ¯ y ) (cid:107) ¯ y (cid:17) or(4.3) (cid:16) (cid:107) ψ (¯ x, ¯ y ) | θ (¯ x, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ x, ¯ y ) | θ ∗ (¯ x, ¯ y ) (cid:107) ¯ y + r (cid:17) . (4.4)Since the second case (4.4) is treated by straightforward variations of the arguments fortaking care of the first case (4.3) we only consider the first case (4.3). Observe that | ¯ x ¯ y | + qr ( ψ ) ≤ k (because qr ( ϕ ) = | ¯ y | + max { qr ( ψ ) , qr ( θ ) , qr ( ψ ∗ ) , qr ( θ ∗ ) } ) and similarlyfor θ , ψ ∗ and θ ∗ . Since all the formulas ψ , θ , ψ ∗ and θ ∗ have smaller quantifier-rankthan ϕ we may, by the induction hypothesis, assume that ψ (¯ x, ¯ y ) , θ (¯ x, ¯ y ) , ψ ∗ (¯ x, ¯ y ) and θ ∗ (¯ x, ¯ y ) are quantifier-free formulas.If θ (¯ x, ¯ y ) or θ ∗ (¯ x, ¯ y ) is unsatisfiable, then, by the provided semantics, we have A (cid:54)| = ϕ (¯ a ) for every σ -structure A and every sequence of elements ¯ a from the domain of A .In this case ϕ (¯ x ) is equivalent to any contradictory quantifier-free formula with free ONDITIONAL PROBABILITY LOGIC 23 variables among ¯ x , for example the formula x (cid:54) = x . So from now on we assume that θ (¯ x, ¯ y ) and θ ∗ (¯ x, ¯ y ) are satisfiable.Until further notice, assume also that ψ (¯ x, ¯ y ) ∧ θ (¯ x, ¯ y ) and ψ ∗ (¯ x, ¯ y ) ∧ θ ∗ (¯ x, ¯ y ) aresatisfiable. Then there are distinct complete atomic σ -types q i (¯ x ) , p i,j (¯ x, ¯ y ) , for i =1 , . . . , m and j = 1 , . . . , m i , and distinct complete atomic σ -types t i (¯ x ) , s i,j (¯ x, ¯ y ) , for i = 1 , . . . , l and j = 1 , . . . , l i , such that the following conditions hold: • q i (¯ x ) ⊆ p i,j (¯ x, ¯ y ) for all i = 1 , . . . , m and all j = 1 , . . . , m i . • ψ (¯ x, ¯ y ) ∧ θ (¯ x, ¯ y ) is equivalent to (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, ¯ y ) . • t i (¯ x ) ⊆ s i,j (¯ x, ¯ y ) for all i = 1 , . . . , l and all j = 1 , . . . , l i . • θ (¯ x, ¯ y ) is equivalent to (cid:87) li =1 (cid:87) l i j =1 s i,j (¯ x, ¯ y ) .Since (cid:87) li =1 (cid:87) l i j =1 s i,j (¯ x, ¯ y ) is a consequence of (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, ¯ y ) it follows that m ≤ l and m i ≤ l i for all i ≤ m . Moreover, for every i ≤ m there is i (cid:48) such that q i = t i (cid:48) , and forall i ≤ m and all j ≤ m i there are i (cid:48) , j (cid:48) such that p i,j = s i (cid:48) ,j (cid:48) . Therefore we may assumein addition (by reordering if necessary) that(4.5) q i = t i for all i ≤ m and p i,j = s i,j for all i ≤ m and all j ≤ m i .For the same reasons as in the previous case we may assume that all of P ( q i ) , P ( p i,j ) , P ( t i ) and P ( s i,j ) are positive for all i and j . Next we define d i,j = dim ¯ y ( p i,j ) for all i = 1 , . . . , m and j = 1 , . . . , m i ,e i,j = dim ¯ y ( s i,j ) for all i = 1 , . . . , l and j = 1 , . . . , l i ,d i = max { d i, , . . . , d i,m i } for all i = 1 , . . . , m,e i = max { e i, , . . . , e i,l i } for all i = 1 , . . . , l,α i,j = P ( p i,j (¯ x, ¯ y ) | q i (¯ x )) for all i = 1 , . . . , m and j = 1 , . . . , m i ,α i = the sum of all α i,j such that d i,j = d i ,β i,j = P ( s i,j (¯ x, ¯ y ) | t i (¯ x )) for all i = 1 , . . . , l and j = 1 , . . . , l i ,β i = the sum of all β i,j such that e i,j = e i .It follows that for all i = 1 , . . . , m we have d i ≤ e i and α i ≤ β i . Definition 4.33.
For all i = 1 , . . . , l we define a number γ i as follows:(1) If i ≤ m and d i = e i > then we define γ i = α i /β i .(2) If i ≤ m and d i = e i = 0 then we define γ i = m i /l i .(3) If i ≤ m and d i < e i then we define γ i = 0 .(4) If m < i ≤ l then we define γ i = 0 .Now we can reason in exactly the same way with regard to the formulas ψ ∗ (¯ x, ¯ y ) and θ ∗ (¯ x, ¯ y ) . So there are numbers m ∗ , l ∗ , m ∗ i and l ∗ i and complete atomic σ -types q ∗ i (¯ x ) for i = 1 , . . . , m ∗ , p ∗ i,j (¯ x, ¯ y ) for i ≤ m ∗ and j = 1 , . . . , m ∗ i , t ∗ i (¯ x ) for i = 1 , . . . , l ∗ and s ∗ i,j (¯ x, ¯ y ) for i ≤ l ∗ and j = 1 , . . . , l ∗ i such that all which has been said about ψ , θ , q i , p i,j , t i and s i,j holds if these formulas and types are replaced by ψ ∗ , θ ∗ , q ∗ i , p ∗ i,j etcetera, and thenumbers m , l , m i , l i are replaced by m ∗ , l ∗ , m ∗ i and l ∗ i . Moreover, we define numbers d ∗ i,j , e ∗ i,j , d ∗ i , e ∗ i , α ∗ i,j , α ∗ i , β ∗ i,j , β ∗ i and γ ∗ i in the same way as above, using the types q ∗ i , p ∗ i,j , t ∗ i and s ∗ i,j instead of q i , p i,j , t i and s i,j .So far we have assumed that ψ (¯ x, ¯ y ) ∧ θ (¯ x, ¯ y ) and ψ ∗ (¯ x, ¯ y ) ∧ θ ∗ (¯ x, ¯ y ) are satisfiable.If ψ (¯ x, ¯ y ) ∧ θ (¯ x, ¯ y ) is not satisfiable, then we let m = 0 and we view the disjunction (cid:87) mi =1 (cid:87) m i j =1 p i,j (¯ x, ¯ y ) as “empty” and hence always false. In this case we always have i > m so it follows that γ i = 0 for all i = 1 , . . . , l . Similar conventions apply if ψ ∗ (¯ x, ¯ y ) ∧ θ ∗ (¯ x, ¯ y ) is not satisfiable. With these conventions the case when any one of the mentionedformulas is unsatisfiable is taken care of by the rest of the proof. Lemma 4.34.
Let i ∈ { , . . . , m } .(a) For all suffiently large n and all A ∈ Y n , γ i (1 + 2 ε ) ≤ (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . (b) If d i = e i then, for all sufficiently large n and all A ∈ Y n , (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) γ i . (c) If d i < e i then, for all sufficiently large n and all A ∈ Y n , (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn where the constant C > depends only on the types p i,j and s i,j .(d) Parts (a), (b) and (c) hold if m, m i , l i , γ i , d i , e i , p i,j and s i,j are replaced by m ∗ , m ∗ i , l ∗ i , γ ∗ i , d ∗ i , e ∗ i , p ∗ i,j and s ∗ i,j , respectively. Proof.
We split the argument into cases corresponding to the three first cases of Defi-nition 4.33. Let
A ∈ Y n . First suppose that d i = e i > and hence γ i = α i /β i . Since A is assumed to be ( p i,j , q i , (1 + ε ) α i,j ) -unsaturated if d i,j > it follows that (cid:12)(cid:12) p i,j (¯ a, A ) (cid:12)(cid:12) ≤ (1 + ε ) α i,j n d i,j if d i,j > .If d i,j = 0 then (cid:12)(cid:12) p i,j (¯ a, A ) (cid:12)(cid:12) = 1 and each member of the unique tuple realizing p i,j (¯ a, ¯ y ) belongs to ¯ a . It follows that(4.6) (cid:12)(cid:12)(cid:12)(cid:12) m i (cid:91) j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) α i n d i for all sufficiently large n .By similiar reasoning (and since we assume d i = e i ) we get(4.7) (cid:12)(cid:12)(cid:12)(cid:12) l i (cid:91) j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) β i n d i . Since A is assumed to be ( p i,j , q i , α i,j / (1+ ε )) -saturated if d i,j > and ( s i,j , t i , β i,j / (1+ ε )) -saturated if e i,j > it follows that (cid:12)(cid:12) p i,j (¯ a, A ) (cid:12)(cid:12) ≥ α i,j n d i,j / (1 + ε ) if d i,j > and (cid:12)(cid:12) s i,j (¯ a, A ) (cid:12)(cid:12) ≥ β i,j n e i,j / (1 + ε ) if e i,j > . This (and d i = e i ) implies that(4.8) (cid:12)(cid:12)(cid:12)(cid:12) m i (cid:91) j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ α i n d i ε and (cid:12)(cid:12)(cid:12)(cid:12) l i (cid:91) j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ β i n d i ε . From (4.6), (4.7) and (4.8) we get(4.9) γ i (1 + 2 ε ) = α i (1 + 2 ε ) β i ≤ (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) α i β i = (1 + 2 ε ) γ i . Now suppose that d i = e i = 0 . Then γ i = m i /l i . Also, each p i,j (¯ a, ¯ y ) and each s i,j (¯ a, ¯ y ) has a unique realization in A . Since we assume that p i,j (cid:54) = p i,j (cid:48) if j (cid:54) = j (cid:48) and s i,j (cid:54) = s i,j (cid:48) ONDITIONAL PROBABILITY LOGIC 25 if j (cid:54) = j (cid:48) we get γ i = m i l i = (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) , and now the inequalities of (a) and (b) follow trivially. Next, suppose that d i < e i . Then γ i = 0 . By similar reasoning as before, < β i n e i ε ≤ (cid:12)(cid:12)(cid:12)(cid:12) l i (cid:91) j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) for sufficiently large n. It follows that γ i (1 + 2 ε ) = 0 ≤ (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Since e i > we can argue as we did to get (4.8), so we have (cid:12)(cid:12)(cid:12)(cid:12) l i (cid:91) j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ β i n e i ε . Depending on whether d i > or d i = 0 we get, by arguing as in previous cases, (cid:12)(cid:12)(cid:12)(cid:12) m i (cid:91) j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) α i n d i or (cid:12)(cid:12)(cid:12)(cid:12) m i (cid:91) j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) = m i . Since d i < e i we get, in either case, (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn for sufficiently large n where C > is a constant that depends only on the types p i,j and s i,j .The proof of part (d) is, of course, exactly the same (besides the relevant replacementsof symbols). (cid:3) Definition 4.35.
Let I be the set of all i ∈ { , . . . , l } such that there exists some i (cid:48) ∈ { , . . . , l ∗ } such that t i = t ∗ i (cid:48) and r + γ i ≥ γ ∗ i (cid:48) . Remark 4.36. (The computational problem of finding I ) The number α i,j isobtained from numbers given by assumptions 4.1 and 4.10 and applying a number ofarithmetic operations which is linear in | p i,j | . It follows that the number of arithmeticoperations needed to compute α i is linear in (cid:80) m i j =1 | p i,j | , where by an arithmetic operationI mean addition, multiplication or division. The case is similar for β i , α ∗ i and β ∗ i . Thenumber of comparisons of literals needed to check if t i = t ∗ i (cid:48) is | t i | if we assume that weuse some uniform way of listing the literals in complete atomic σ -types. So to decide if i ∈ I we need to perform a number of arithmetic operations, comparisons of literals andcomparisons of numbers which is linear in m i (cid:88) j =1 | p i,j | + l i (cid:88) j =1 | s i,j | + m ∗ i (cid:88) j =1 | p ∗ i,j | + l ∗ i (cid:88) j =1 | s ∗ i,j | . Consequently the number of arithmetic operations, comparisons of literals and compar-isons of numbers that are needed to create I is linear in m (cid:88) i =1 m i (cid:88) j =1 | p i,j | + l (cid:88) i =1 l i (cid:88) j =1 | s i,j | + m ∗ (cid:88) i =1 m ∗ i (cid:88) j =1 | p ∗ i,j | + l ∗ (cid:88) i =1 l ∗ i (cid:88) j =1 | s ∗ i,j | . Lemmas 4.37 and 4.38 below show that ϕ (¯ x ) is equivalent, in every A ∈ W n for alllarge enough n , to a quantifier-free formula which depends only on ϕ (¯ x ) and the liftedBayesian network G . As noted after Definition 4.29, 0 is a ζ -critical number for every ζ ,so r > (since ϕ is noncritical). Observe that it follows from Definitions 4.29 and 4.33that γ i and γ ∗ i are ( | ¯ x | + qr( ϕ )) -critical numbers for all i . Lemma 4.37.
Suppose that I (cid:54) = ∅ . Then for all sufficiently large n , all A ∈ Y n and all ¯ a ∈ [ n ] | ¯ x | , A | = (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) if and only if A | = (cid:87) i ∈ I t i (¯ a ) . Proof.
Suppose that
A | = (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) . Then both (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y and (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y are defined in A and r + (cid:12)(cid:12) ψ (¯ a, A ) ∩ θ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) θ (¯ a, A ) (cid:12)(cid:12) ≥ (cid:12)(cid:12) ψ ∗ (¯ a, A ) ∩ θ ∗ (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12) θ ∗ (¯ a, A ) (cid:12)(cid:12) , so(4.10) r + (cid:12)(cid:12)(cid:12) (cid:83) mι =1 (cid:83) m ι j =1 p ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) lι =1 (cid:83) l ι j =1 s ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Now ¯ a realizes exactly one of t (¯ x ) , . . . , t (¯ x ) and exactly one of t ∗ (¯ x ) , . . . , t ∗ (¯ x ) so thereare ≤ i ≤ l and ≤ i (cid:48) ≤ l ∗ such that A | = t i (¯ a ) ∧ t ∗ i (cid:48) (¯ a ) and hence t i = t ∗ i (cid:48) . If r + γ i ≥ γ ∗ i (cid:48) then i ∈ I and hence A | = (cid:87) i ∈ I t i (¯ a ) so we are done. Hence it remains to prove that r + γ i ≥ γ ∗ i (cid:48) . We divide the argument into cases. Case 1 : Suppose that i > m and i (cid:48) > m ∗ . Then, by the definition of γ i and γ ∗ i (Definition 4.33), we have γ i = γ ∗ i (cid:48) = 0 so we get r + γ i ≥ γ ∗ i (cid:48) . Case 2 : Suppose that i ≤ m and i (cid:48) > m ∗ . Then γ ∗ i (cid:48) = 0 and as γ i is always nonnegativewe get r + γ i ≥ γ ∗ i (cid:48) . Case 3 : Suppose that i > m and i (cid:48) ≤ m ∗ . By Lemma 4.34 (a), assuming that n issufficiently large, γ ∗ i (cid:48) (1 + 2 ε ) ≤ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Since i > m we have p ι,j (¯ a, A ) = ∅ for every ≤ ι ≤ m and every ≤ j ≤ m ι , so (cid:12)(cid:12)(cid:12) (cid:83) mι =1 (cid:83) m ι j =1 p ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) lι =1 (cid:83) l ι j =1 s ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) = 0 . This together with (4.10) implies that(4.11) r ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ γ ∗ i (cid:48) (1 + 2 ε ) . If r < γ ∗ i (cid:48) then, since ϕ (¯ x ) is ε -noncritical, we get r < γ ∗ i (cid:48) / (1 + 2 ε ) which contra-dicts (4.11). Hence r ≥ γ ∗ i (cid:48) and since γ i = 0 (because i > m ) we get r + γ i ≥ γ ∗ i (cid:48) . ONDITIONAL PROBABILITY LOGIC 27
Case 4 : Suppose that i ≤ m and i (cid:48) ≤ m ∗ . Then (4.10) reduces to(4.12) r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Towards a contradiction, suppose that r + γ i < γ ∗ i (cid:48) . Since ϕ (¯ x ) is assumed to be ε -noncritical we get(4.13) r + (1 + 2 ε ) γ i < γ ∗ i (cid:48) (1 + 2 ε ) . Recall, from the definition of d i and e i , that d i ≤ e i . We now consider two subcases andin each subcase we will derive a contradiction to (4.12). Subcase 4(a): Suppose that d i = e i . By parts (a) and (b) of Lemma 4.34, (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ (1 + 2 ε ) γ i and (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ γ ∗ i (cid:48) (1 + 2 ε ) . (4.14)From (4.13) and (4.14) we get r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ r + (1 + 2 ε ) γ i < γ ∗ i (cid:48) (1 + 2 ε ) ≤ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) which contradicts (4.12). Subcase 4(b): Suppose that d i < e i . Then Lemma 4.34 (c) gives(4.15) (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn where the constant C > depends only on the involved types. Lemma 4.34 (a) gives (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ γ ∗ i (cid:48) (1 + 2 ε ) . Since d i < e i implies that γ i = 0 it follows from (4.13) that r < γ ∗ i (cid:48) / (1 + 2 ε ) . Note thatthe right hand term in (4.15) tends to 0 as n tends to infinity. So for all sufficiently large n we have r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ r + Cn < γ ∗ i (cid:48) (1 + 2 ε ) ≤ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) and this contradicts (4.12). Now suppose that
A | = (cid:87) i ∈ I t i (¯ a ) , so A | = t i (¯ a ) for some i ∈ I . By Definition 4.35of I there is i (cid:48) ∈ { , . . . , l ∗ } such that t i = t ∗ i (cid:48) and r + γ i ≥ γ ∗ i (cid:48) . Since ϕ (¯ x ) is an ε -noncritical formula it is, in particular, noncritical which implies that r + γ i (cid:54) = γ ∗ i (cid:48) andhence r + γ i > γ ∗ i (cid:48) . Since ϕ (¯ x ) is ε -noncritical it follows that(4.16) r + γ i / (1 + 2 ε ) > γ ∗ i (cid:48) (1 + 2 ε ) . It suffices to prove that(4.17) r + (cid:12)(cid:12)(cid:12) (cid:83) mι =1 (cid:83) m ι j =1 p ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) lι =1 (cid:83) l ι j =1 s ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . Again we divide the proof into cases.
Case 1 : Suppose that i (cid:48) > m ∗ . Then the term to the right of ‘ ≥ ’ in (4.17) is zero,so (4.17) holds. Case 2 : Suppose that i > m and i (cid:48) ≤ m ∗ . Then the term immediately to the left of‘ ≥ ’ in (4.17) is zero, so we need to prove that(4.18) r ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . From i > m we get γ i = 0 so (4.16) reduces to(4.19) r > γ ∗ i (cid:48) (1 + 2 ε ) . Recall that from the definition it follows that d ∗ i (cid:48) ≤ e ∗ i (cid:48) . Subcase 2(a): Suppose that d ∗ i (cid:48) = e ∗ i (cid:48) . Then using parts (d) and (b) of Lemma 4.34 weget γ ∗ i (cid:48) (1 + 2 ε ) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . which together with (4.19) gives (4.18). Subcase 2(b): Suppose that d ∗ i (cid:48) < e ∗ i (cid:48) . Then parts (d) and (c) of Lemma 4.34 implythat (cid:12)(cid:12)(cid:12) (cid:83) m ∗ ι =1 (cid:83) m ∗ ι j =1 p ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ ι =1 (cid:83) l ∗ ι j =1 s ∗ ι,j (¯ a, A ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn for some constant C > depending only on the involved types. Since r > it followsthat (4.18) holds for all sufficiently large n . Case 3 : Suppose that i ≤ m and i (cid:48) ≤ m ∗ . Now (4.17) is equivalent to(4.20) r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) . So it remains to prove (4.20). By Lemma 4.34 and (4.16) we have(4.21) r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ r + γ i (1 + 2 ε ) > (1 + 2 ε ) γ ∗ i (cid:48) . If d ∗ i (cid:48) = e ∗ i (cid:48) , then, by Lemma 4.34, (1 + 2 ε ) γ ∗ i (cid:48) ≥ (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) which together with (4.21) gives (4.20).Now suppose that d ∗ i (cid:48) < e ∗ i (cid:48) . Then γ ∗ i (cid:48) = 0 and (4.21) reduces to(4.22) r + (cid:12)(cid:12)(cid:12) (cid:83) m i j =1 p i,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l i j =1 s i,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≥ r + γ i (1 + 2 ε ) > Lemma 4.34 gives (cid:12)(cid:12)(cid:12) (cid:83) m ∗ i (cid:48) j =1 p ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:83) l ∗ i (cid:48) j =1 s ∗ i (cid:48) ,j (¯ a, A ) (cid:12)(cid:12)(cid:12) ≤ Cn ONDITIONAL PROBABILITY LOGIC 29
This together with (4.22) gives (4.20) for all sufficiently large n . This completes theproof of Lemma 4.37. (cid:3) Lemma 4.38.
Suppose that I = ∅ . Then for all sufficiently large n , all A ∈ Y n and all ¯ a ∈ [ n ] | ¯ x | , A (cid:54)| = (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) . Hence the formula (4.3) is equivalent, in every such A , to any contradictory quantifier-free formula. Proof.
Suppose that I = ∅ . Suppose towards a contradiction that there are arbitrarilylarge n , A ∈ Y n and ¯ a ∈ [ n ] | ¯ x | such that A | = (cid:16) r + (cid:107) ψ (¯ a, ¯ y ) | θ (¯ a, ¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ a, ¯ y ) | θ ∗ (¯ a, ¯ y ) (cid:107) ¯ y (cid:17) . Then we argue just as we did in the beginning of the proof of Lemma 4.37 to get (4.10)and find ≤ i ≤ l and ≤ i (cid:48) ≤ l ∗ such that t i = t ∗ i (cid:48) . Since I = ∅ we must have i / ∈ I and therefore r + γ i < γ ∗ i (cid:48) . Now we can continue to argue exactly as in the proof ofLemma 4.37 to get a contradiction in each one of the cases 1–4 in that proof. (cid:3) Remark 4.39. (The case when ¯ x is empty) Suppose now that ¯ x is empty, so theformula (4.3) becomes(4.23) (cid:16) r + (cid:107) ψ (¯ y ) | θ (¯ y ) (cid:107) ¯ y ≥ (cid:107) ψ ∗ (¯ y ) | θ ∗ (¯ y ) (cid:107) ¯ y (cid:17) , where we can assume that ψ , θ , ψ ∗ and θ ∗ are quantifier-free. Then there are distincttypes p i (¯ y ) , i = 1 , . . . , m and distinct types s i (¯ y ) , i = 1 , . . . , l . We can now definenumbers γ and γ ∗ similarly as each γ i (and γ ∗ i ) was defined above. We now get ananalogoue of Lemma 4.34 which gives the same kind of upper and lower bounds of (cid:12)(cid:12) (cid:83) mi =1 p i ( A ) (cid:12)(cid:12)(cid:46)(cid:12)(cid:12) (cid:83) li =1 s i ( A ) (cid:12)(cid:12) in terms of γ . If r + γ ≥ γ ∗ then, by the noncriticalityof (4.23), we get r + γ > γ ∗ and by the ε -noncriticality of the same formula we get r + γ/ (1 + 2 ε ) > γ ∗ (1 + 2 ε ) . Now we can argue similarly as in the “converse direction”in the proof of Lemma 4.37 and conclude that (4.23) is true in all A ∈ Y n for allsufficiently large n ; hence (4.23) is equivalent to (cid:62) in all such A . Now suppose that r + γ < γ ∗ and suppose, towards a contradiction, that there are arbitrarily large n and A ∈ Y n in which (4.23) holds. Then we can argue as in the first part of the proof ofLemma 4.37 and get a contradiction. Hence, for all sufficiently large n , (4.23) is falsein all A ∈ Y n ; consequently, (4.23) is equivalent to ¬(cid:62) in all such A . (The case when ϕ has the form ∃ yψ (¯ y ) is easier and analogous to the argument in the beginning of theproof of Proposition 4.32 so this part is left to the reader.)Now the proof of Proposition 4.32 is completed. (cid:3) Definition 4.40.
Define a function δ : N + → R ≥ by δ ( n ) = 5 · max { δ (cid:48) ( n ) , e − cn } where c > is like in Lemma 4.28. Proposition 4.41. (Completion of the induction step)
Let Y n ⊆ W n , ε > and δ ( n ) be as in definitions 4.26, 4.31 and 4.40, respectively. Then: (1) lim n →∞ δ ( n ) = 0 . (2) P n ( Y n ) ≥ − δ ( n ) for all sufficiently large n . (3) For every complete atomic σ -type p (¯ x ) with | ¯ x | ≤ k there is a number which wedenote P ( p (¯ x )) such that for all sufficiently large n and all ¯ a ∈ [ n ] which realizethe identity fragment of p , (cid:12)(cid:12) P n (cid:0) {A ∈ W n : A | = p (¯ a ) } (cid:1) − P ( p (¯ x )) (cid:12)(cid:12) ≤ δ ( n ) . (4) For every complete atomic σ -type p (¯ x, ¯ y ) with | ¯ x ¯ y | ≤ k and < dim ¯ y ( p (¯ x, y )) = | ¯ y | , if q (¯ x ) = p (cid:22) ¯ x and P ( q ) > , then for all sufficiently large n , every A ∈ Y n is ( p, q, α/ (1+ ε )) -saturated and ( p, q, α (1+ ε )) -unsaturated if α = P ( p (¯ x, ¯ y )) | P ( q (¯ x )) . (5) For every ε -noncritical ϕ (¯ x ) ∈ CP L ( σ ) with | ¯ x | + qr ( ϕ ) ≤ k , there is a quantifier-free σ -formula ϕ ∗ (¯ x ) such that for all sufficiently large n and all A ∈ Y n , A | = ∀ ¯ x (cid:0) ϕ (¯ x ) ↔ ϕ ∗ (¯ x ) (cid:1) . Proof.
Parts (1) and (2) follows from the definition of δ ( n ) , Assumption 4.10 andLemma 4.28. Part (3) follows from Corollary 4.19. Part (4) follows from Corollary 4.27and the definition of ε . Part (5) follows from Proposition 4.32. (cid:3) Corollary 4.42.
Let ε > be as in Definition 4.31.(a) If ϕ (¯ x ) ∈ CP L ( σ ) is ε -noncritical and | ¯ x | + qr( ϕ ) ≤ k , then there are c > and ≤ d ≤ which is a sum of numbers of the form P ( p ) , where p is a complete atomic σ -type, such that for every m ∈ N + and every ¯ a ∈ [ m ] | ¯ x | such that A | = ϕ (¯ a ) for some A ∈ W m , (cid:12)(cid:12) P n ( ϕ (¯ a )) − d (cid:12)(cid:12) ≤ Cδ ( n ) for all sufficiently large n where the constant C depends only on ϕ .(b) If ϕ ∈ CP L ( σ ) has no free variable, is ε -noncritical and qr( ϕ ) ≤ k , then either P n ( ϕ ) ≤ δ ( n ) for all sufficiently large n , or P n ( ϕ ) ≥ − δ ( n ) for all sufficiently large n .(c) Suppose that for every R ∈ σ \ σ (cid:48) , if ¯ x is the sequence of free variables of χ R,i then | ¯ x | + qr( χ R,i ) ≤ k . Let P ∗ n be defined as P n except that we replace χ R,i by χ ∗ R,i inDefinition 4.2 where χ ∗ R,i is a quantifier-free formula which his equivalent to χ R,i in everystructure in Y n for all large enough n . If ϕ (¯ x ) ∈ CP L ( σ ) is ε -noncritical, | ¯ x | +qr( ϕ ) ≤ k and A | = ϕ (¯ a ) for some A ∈ W m and some m , then (cid:12)(cid:12) P ∗ n ( ϕ (¯ a )) − P n ( ϕ (¯ a )) (cid:12)(cid:12) ≤ δ ( n ) for all sufficiently large n . Proof. (a) Suppose that ϕ (¯ x ) ∈ CP L ( σ ) is ε -noncritical and | ¯ x | + qr( ϕ ) ≤ k . Bypart (5) of Proposition 4.41 ϕ (¯ x ) is equivalent, in every A ∈ Y n (for large enough n ),to a quantifier-free formula ϕ ∗ (¯ x ) . Then ϕ ∗ (¯ x ) is equivalent to a disjunction of completeatomic σ -types (cid:87) li =1 p i (¯ x ) . Suppose that A | = ϕ (¯ a ) for some A ∈ W m and some m .Let I be the set of indices i such that A | = p i (¯ a ) for some A ∈ W n and some n . Byassumption, I (cid:54) = ∅ . Let d = (cid:80) i ∈ I P ( p i ) . By part (3) of Proposition 4.41, we have (cid:12)(cid:12) P n ( ϕ ∗ (¯ a )) − d (cid:12)(cid:12) ≤ | I | δ ( n ) for all sufficiently large n , and now (a) follows from part (2)of Proposition 4.41.(b) Suppose that ϕ ∈ CP L ( σ ) has no free variable, is ε -noncritical and qr( ϕ ) ≤ k . ByProposition 4.41 (5), there is a quantifier-free sentence ϕ ∗ such that for all sufficientlylarge n and all A ∈ Y n , A | = ϕ ↔ ϕ ∗ . Then ϕ ∗ must be equivalent to ⊥ or (cid:62) . Theconclusion of part (b) now follows from parts (1) and (2) of Proposition 4.41.(c) Since χ R,i is equivalent to χ ∗ R,i in every
A ∈ Y n it follows from the definitionsof P n and P ∗ n that if A ∈ Y n then P ∗ n ( A ) = P n ( A ) . It follows that if X n ⊆ Y n then P ∗ n ( X n ) = P n ( X n ) and in particular P ∗ n ( Y n ) = P n ( Y n ) . Since P ∗ n ( W n \ Y n ) =1 − P ∗ n ( Y n ) , and similarly for P n , it follows that P ∗ n ( W n \ Y n ) = P n ( W n \ Y n ) . Frompart (2) of Proposition 4.41 we get P ∗ n ( W n \ Y n ) = P n ( W n \ Y n ) ≤ δ ( n ) . Let X n = {A ∈ W n : A | = ϕ (¯ a ) } . Then P ∗ n ( X n ) ≤ P ∗ n ( X n | Y n ) P ∗ n ( Y n ) + δ ( n ) = P ∗ n ( X n ∩ Y n ) + δ ( n ) = P n ( X n ∩ Y n ) + δ ( n ) ≤ P n ( X n ) + δ ( n ) , and by similar reasoning P n ( X n ) ≤ P ∗ n ( X n ) + δ ( n ) . (cid:3) ONDITIONAL PROBABILITY LOGIC 31 Concluding remarks
The results of this article considers one particular formal logic and one type of liftedgraphical model. Also, given these two things, choices have been made for example re-garding exactly how to define a probability distribution on the set of structures with acommon finite domain. From the point of view machine learning and artificial intelli-gence, as well as mathematical curiosity, one could ask a number of questions, of whichI suggest a few below.In finite model theory, theoretical computer science and linguistics a number of ex-tensions of first-order logic have been considered [20]. For example, a generic way ofextending first-order logic is by adding one or more so-called generalized quantifiers[15, 17]. In machine learning, data mining and artificial intelligence a number of differ-ent (lifted) graphical models, including the popular
Markov networks [7, 18]. For whichcombinations of formal logical language and lifted graphical model do we get “almost sureelimination of quantifiers” and/or “logical limit laws”? Do we get more expressive for-malisms by using aggregation functions than if we use aggregation rules, or vice versa?How do different combinations of formal language and graphical model relate to eachother? In what sense is a combination (formal language 1, graphical model 1) “better”than a combination (formal language 2, graphical model 2)? What are reasonable can-didates for the relation “A is better/stronger than B”? Some thoughts in this directionappear in the last part of [5].One can consider conditional probabilities which are not constant, but depend on thesize of the set of elements (or tuples) satisfying the condition in question. As a specialcase we have probabilities that depend on the size of the whole domain, as in previouswork on logical zero-one laws in random graphs [24, 25].)What if the probability of a tuple ¯ a satisfying a relation is dependent on whetheranother tuple ¯ b satisfies the same relation (as in [19, 21] for example)?A situation that seems natural in the context of artificial intelligence is to have anunderlying fixed structure and on top of it relations that are “governed” by some prob-abilistic graphical model. The underlying fixed structure could be represented by a τ -structure A for some signature τ . For another signature σ (disjoint from τ ) we couldconsider the set of expansions of A to ( τ ∪ σ ) -structures where the probabilities of theseextensions are governed by some probabilistic model and the underlying structure A . Toformalize this using the set up of this article, one can modify W ∅ n in Definition 3.10 tocontain exactly one τ -structure with domain [ n ] and W n will be the set of all ( τ ∪ σ ) -structures that expand the uniquen structure in W ∅ n . The definition of the probabilitydistribution P n on W n can now depend not only on the lifted Bayesian network G butalso on the unique structure in W ∅ n . It seems obvious that, in order to get similar re-sults as in this article, one needs to assume some sort of uniformity regarding the uniquestructure in W ∅ n for cofinitely many n . References [1] N. Alon, J. H. Spencer,
The Probabilistic Method , Second Edition, John Wiley & Sons (2000)[2] F. Bacchus, A. J. Grove, J. Y. Halpern, D. Koller, From statistical knowledge bases to degrees ofbelief,
Artificial Intelligence , Vol. 87 (1996) 75–143.[3] C. Borgelt, R. Kruse,
Graphical Models: Methods for Data Analysis and Mining , John Wiley &Sons (2002).[4] H. Chernoff, A measure of the asymptotic efficiency for tests of a hypothesis based on the sum ofobservations,
Annals of Mathematical Statistics , Vol. 23 (1952) 493–509.[5] L. De Raedt, P. Frasconi, K. Kersting, S. Muggleton (editors),
Probabilistic Inductive Logic Pro-gramming: Theory and Applications , Lecture Notes in Artificial Intelligence 4911, Springer-Verlag,Berlin Heidelberg (2008). [6] L. De Raedt, K. Kersting, S. Natarajan, D. Poole,
Statistical Relational Artificial Intelligence: Logic,Probability, and Computation , Synthesis Lectures on Artificial Intelligence and Machine Learning
Communications ofthe ACM , Vol. 62 (2019) 74–83.[8] R. Fagin, Probabilities on finite models,
The Journal of Symbolic Logic , Vol. 41 (1976) 50-58.[9] Lise Getoor, Ben Taskar (Editors),
Introduction to Statistical Relational Learning , The MIT Press(2007).[10] Y. V Glebskii, D. I. Kogan, M. I. Liogonkii, V. A. Talanov, Volume and fraction of satisfiability offormulas of the lower predicate calculus,
Kibernetyka
Vol. 2 (1969) 17-ö27.[11] J. Y. Halpern, An analysis of first-order logics of probability,
Artificial Intelligence , Vol. 46 (1990)311–350.[12] C. D. Hill, On 0,1-laws and asymptotics of definable sets in geometric Fraïssé classes,
FundamentaMathematicae , Vol. 239 (2017) 201–219.[13] M. Jaeger, Convergence results for relational Bayesian networks,
Proceedings of the 13th AnnualIEEE Symposium on Logic in Computer Science (LICS 98) (1998).[14] M. Jaeger, Reasoning about infinite random structures with relational Bayesian networks,
Proceed-ings of the Sixth International Conference on Principles of Knowledge Representation and Reasoning(KR 98) (1998).[15] R. Kaila, On probabilistic elimination of generalized quantifiers,
Random Structures and Algorithms ,Vol. 19 (2001) 1–36.[16] H. J. Keisler, W. B. Lotfallah, Almost everywhere elimination of probability quantifiers,
The Journalof Symbolic Logic , Vol. 74 (2009) 1121–1142.[17] E. Keenan, D. Westerstahl, Generalized quantifiers in linguistics and logic, in J. van Benthem, A.ter Meulen (editors),
Handbook of Logic and Language, Second Edition , 859–910, Elsevier (2011).[18] A. Kimmig, L. Mihalkova, L. Getoor, Lifted graphical models: a survey,
Machine Learning , Vol. 99(2015) 1–45.[19] Ph. G. Kolaitis, H. J. Prömel, B. L. Rothschild, K l +1 -free graphs: asymptotic structure and a 0-1law, Transactions of The American Mathematical Society , Vol. 303 (1987) 637–671.[20] L. Libkin,
Elements of Finite Model Theory , Springer-Verlag, Berlin Heidelberg New York (2004).[21] J. F. Lynch, Convergence law for random graphs with specified degree sequence,
ACM Transactionson Computational Logic , Vol. 6 (2005) 727–748.[22] D. Mubayi, C. Terry, Discrete metric spaces: structure, enumeration, and 0-1 laws,
The Journal ofSymbolic Logic , Vol. 84 (2019) 1293–1324.[23] J. Pearl,
Causality: Models, Reasoning, and Inference , Second Edition, Cambridge University Press(2009).[24] S. Shelah, J. Spencer, Zero-one laws for sparse random graphs,
Journal of the American Mathemat-ical Society , Vol. 1 (1988) 97–115.[25] J. Spencer,
The Strange Logic of Random Graphs , Springer-Verlag, Berlin Heidelberg New York(2001).
Vera Koponen, Department of Mathematics, Uppsala University, Box 480, 75106 Upp-sala, Sweden.
E-mail address ::