An asymptotic analysis of probabilistic logic programming with implications for expressing projective families of distributions
aa r X i v : . [ c s . L O ] F e b An asymptotic analysis of probabilistic logic programmingwith implications for expressing projective families ofdistributions
Felix Weitkämper
Institut für Informatik, Ludwig-Maximilians-Universität München, Munich, Germany,[email protected]
Abstract.
Over the last years, there has been increasing research on the scaling behaviourof statistical relational representations with the size of the domain, and on the connectionsbetween domain size dependence and lifted inference. In particular, the asymptotic behaviourof statistical relational representations has come under scrutiny, and projectivity was isolatedas the strongest form of domain size independence.In this contribution we show that every probabilistic logic program under the distributionsemantics is asymptotically equivalent to a probabilistic logic program consisting only ofrange-restricted clauses over probabilistic facts. To facilitate the application of classical re-sults from finite model theory, we introduce the abstract distribution semantics, defined as anarbitrary logical theory over probabilistic facts to bridge the gap to the distribution seman-tics underlying probabilistic logic programming. In this representation, range-restricted logicprograms correspond to quantifier-free theories, making asymptotic quantifier results avilablefor use. We can conclude that every probabilistic logic program inducing a projective familyof distributions is in fact captured by this class, and we can infer interesting consequencesfor the expressivity of probabilistic logic programs as well as for the asymptotic behaviour ofprobabilistic rules.
Statistical relatonal artificial intelligence has emerged over the last 25 years as a means to specifystatistical models for relational data. Since then, many different frameworks have been devlopedunder this heading, which can broadly been classified into those who extend logic programmingto incorporate probabilistic information (probabilistic logic programming under the distributionsemantics) and those who specify an abstract template for probabilistic graphical models (sometimesknown as knowledge-based model construction).They both share the distinction between a general model (a template or a probabilistic logicprogram with variables) and a specific domain used to ground the model. Ideally, the model wouldbe specified abstractly and independently of a specific domain, even though a specific domain maywell have been involved in learning the model from data.However, a significant hurdle is the generally hard to predict or undesirable behaviour of themodel when applied to domains of different sizes. This scaling problem has received much attentionin the past years (see [18], for instance and the references in [13]) and recently Jaeger and Schulte[12,13] have identified projectivity as a strong form of good scaling behaviour: in a projective model,he probability of a given property holding for a given object in the domain is completely inde-pendent of the domain size. However, the examples in [18] show that projectivity cannot be hopedfor in general statistical relational models, and [12] identify very restrictive fragments of commonstatistical relational frameworks as projective.The question remains, however, in what way those fragments are not merely sufficient, butnecessary criteria for a statistical relational model to be projective. This has implications for whichprojective distributions can be expressed by current statistical relational frameworks at all, sincethe limited fragments from [12] are clearly insufficient to express the whole variety of projectivedistributions characterised in [13].We will show in this contribution that in the case of probabilistic logic programming under thedistribution semantics (and those other paradigms that can be reduced to that), the restrictive frag-ment identified in [12] (which corresponds to range-restricted probabilistic logic programs) is indeedsufficient, in the sense that every projective probabilistic logic program is equivalent to a range-restricted probabilistic logic program. Our method will show that, moreover, every probabilisticlogic program is asymptotically equivalent to a range-restricted probabilistic logic program.This will be an application of an asymptotic quantifier elimination result for probabilistic logicprogramming derived from classical finite model theory, namely from the study of the asymptotictheory of first-order and least fixed point logic in the 1980s (particularly in the study of 0-1 laws,applied in the particular form of [1]).This application is also methodologically interesting as it opens another way in which classicalmathematical logic can contribute to studying cutting-edge problems in learning and reasoning.That the theory developed around 0-1 laws would be a natural candidate for such investigationsmay not surprise, as it is highly developed and is itself in the spirit of “finite probabilistic modeltheory” (cf. Section 7 of [5]), and one might hope for more cross-fertilisation between the two fieldsin the future.
We will introduce projective families of distributions in accordance with [12,13], where one canfind a much more detailed exposition of the terms and their background. As we are interested instatistical relational representations as a means of abstracting away from a given ground model, wewill refer to families of distributions with varying domain sizes.
Definition 1. A family of distributions for a relational signature S is a sequence (cid:0) Q ( n ) (cid:1) n ∈ N ofprobability distributions on the set Ω n of all S -structures with domain { , . . . , n } ⊆ N .We will carry on to define exchangeability and projectivity; however, unlike [12,13], which haveincluded exchangeability in projectivity “for convenience” ([12], p. 3), we consider projectivity sep-arately.This would allow us to have constants with a fixed interpretation in our probabilistic logicprograms, as long as n is large enough for all interpreted constants to be included in { , . . . , n } : Definition 2.
A family of distributions is called exchangeable if it is invariant under S -isomorphism.It is called projective if for all m < n ∈ N and all ω ∈ Ω m the following holds: Q ( m ) ( { ω } ) = Q ( n ) ( { ω ′ ∈ Ω n | ω is the substructure of ω ′ with domain { , . . . , n } } ) n the remainder of this paper, we will investigate the interplay between the asymptotic be-haviour of logical theories as they have been studied in finite model theory and the families ofdistributions that are induced by them. We therefore introduce a notion of asymptotic equivalenceof families of distributions. Definition 3.
Two families of distributions ( Q ( n ) ) and ( Q ′ ( n ) ) are asymptotically equivalent if lim n →∞ sup A ⊆ Ω n | Q ( n ) ( A ) − Q ′ ( n ) ( A ) | = 0 Remark.
In measure theoretic terms, the families of distributions ( Q ( n ) ) and ( Q ′ ( n ) ) are asymptot-ically equivalent if and only if the limit of the total variation difference between them is .We will later also refer to logical theories as being asymptotically equivalent if they induceasymptotically equivalent families of distributions. Proposition 4.
Two projective families of distributions are asymptotically equivalent if and onlyif they are equal.Proof.
We will proceed by contradictï¿œon. So assume not. Then there is an m such that Q ( m ) and Q ′ ( m ) are not equal. Let ω be a world of size m which does not have the same probabilityin Q ( m ) and Q ′ ( m ) . Let a := | Q ( m ) ( { ω } ) − Q ′ ( m ) ( { ω } ) | For any n ≥ m , consider the subset A n := { ω ∈ Ω n | ω ↓ [ m ] = ω } . Since both families are projective, | Q ( n ) ( A n ) − Q ′ ( n ) ( A n ) | = | Q ( m ) ( { ω } ) − Q ′ ( m ) ( { ω } ) | = a . Therefore, ( Q ( n ) ) and ( Q ′ ( n ) ) are not asymptotically equivalent.Since we have to expand the vocabulary to represent theories in a distribution semantics, wewill note here that asymptotic equivalence is preserved under reduct. First we have to clarify howwe build reducts of distributions in the first place: Definition 5.
Let Q ( n ) be a distribution over a signature S . Then its reduct Q ( n ) T to a subsignature T ⊆ S is defined such that for any world ω ∈ Ω T n , Q ( n ) T ( ω ) := Q ( n ) ( { ω ′ ∈ Ω S n | ω ′T = ω } ) . Remark. Q ( n ) T is the pushforward measure of Q ( n ) with respect to the reduct projection from Ω S n → Ω T n .We can now formulate preservation of asymptotic equivalence under reducts: Proposition 6.
The reducts of asymptotically equivalent families of distributions are themselvesasymptotically equivalent.Proof.
Let ( Q ( n ) ) and ( Q ′ ( n ) ) be asymptotically equivalent families of distributions over S . Thenfor any T ⊆ S , and any A ⊆ Ω n , | Q ( n ) T ( A ) − Q ′ ( n ) T ( A ) | = | Q ( n ) ( { ω ∈ Ω S n | ω T ∈ A } ) − Q ′ ( n ) ( { ω ∈ Ω S n | ω T ∈ A } ) | . Therefore, lim n →∞ sup A ⊆ Ω T n | Q ( n ) T ( A ) − Q ′ ( n ) T ( A ) | ≤ lim n →∞ sup A ⊆ Ω S n | Q ( n ) ( A ) − Q ′ ( n ) ( A ) | = 0 In the following section, we will then introduce the tools from finite model theory that will help us tounderstand the asymptotic behaviour of probabilistic logic programs later. We will first lay out ourbstract framework and introduce least fixpoint logic, which will prove an adequate representationfor (probabilistic) logic programs. We will then give the necessary background from finite modeltheory and state the main classical results on the asymptotic behaviour of least fixpoint logic.In Section 3, we will display the relationship between the theory developed in Section 2 andprobabilistic logic programming, which will enable us to prove that every probabilistic logic programis asymptotically equivalent to a range-restricted probabilistic logic program and that thereforeby Proposition 4 every projective logic program is actually equivalent to a range-restricted logicprogram. We will conclude that section by discussing how this framework relates to the setting of[5] and what that means for some formalisms of knowledge-based model construction.In Section 4, we highlight some implications of our findings for the asymptotic behaviour ofprobabilistic rules and for the limited expressivity of probabilistic logic programming for generalprojective distributions, and we give a brief first discussion of questions of complexity.Finally, we conclude the paper with some ideas for further research.
Our tools will come from finite model theory, and we require a bridge to link the study of theasymptotic properties of logical theories with the statistical relational formalisms.Cozman and Maua [5] define what they call relational Bayesian network specifications , whichcombine random and independent root predicates with non-root predicates that are defined ul-timately from root predicates using first-order definitions. Since we aim to cover both relationalBayesian networks and logic programming based methods, we will slightly generalise their approachand call the resulting formalism abstract distribution semantics.
In particular, we will generalise away from first-order logic (FOL) to a general logical language:
Definition 7.
Let R be a vocabulary. Then a logical language L ( R ) consists of a collection of formulas ϕ , each of which define a function which takes an R -structure M and a tuple from M n for some n ∈ N (called the arity of ϕ ) and returns either “false” or “true”. We write M | = ϕ ( ~a ) whenever “true” is returned for M and ~a .The archetype of a logical language is the first-order predicate calculus, but the concept asdefined here is sufficiently general to accommodate many other choices. Definition 8.
Let V be a relational vocabulary (possibly with constants), R ⊆ S , and let L ( R ) bea logical language over R . Then an abstract L -distribution over R (with vocabulary S ) consists ofthe following data:For every R ∈ R a number q R ∈ Q ∩ [0 , .For every R ∈ S\R , an L ( R ) -formula φ R of the same arity as R .In the following we will assume that all vocabularies are finite. As we will see in Subsection3.2 below, an abstract FOL-distribution has the same expressivity as the corresponding relationalBayesian network specification from [5]. However, in our analysis of methods based on logic pro-gramming, we will also look at other choices such as least fixed point logic, which has been shownto be an appropriate logical framework to formalise Datalog queries.The semantics of an abstract distribution is only defined relative to a domain D , which we willalso assume to be finite. The formal definition is as follows: efinition 9. Let L ( R ) be a logical language over R and let D be a finite set. Let T be an abstract L -distribution over R . Let Ω D be the set of all R -structures with domain D .Then the probability distribution on Ω D induced by T , written Q ( D ) T , is defined as follows:For all ω ∈ Ω D , if ∃ ~a ∈ ~D ∃ R ∈V\R : R ( ~a ) < φ R ( ~a ) , then Q ( D ) T ( { ω } ) := 0 Otherwise, Q ( D ) T ( { ω } ) := Q R ∈R ( q |{ ~a ∈ ~D | R ( ~a ) }| R ) × Q R ∈R (1 − q R ) |{ ~a ∈ ~D |¬ R ( ~a ) }| In other words, all the relations in R are independent with probability q R and the relations in V\R are defined deterministically by the L ( R ) -formulas φ R .We will use the following notational shorthand: Notation. Q ( D ) T ( ϕ ) := Q ( D ) T ( { ω ∈ Ω D | ω | = ϕ } ) Since we are only considering finite domains, we can assume without loss of generality that everydomain is given by an initial segment of N . We will use the notation Q ( n ) T for Q ( { ,...,n } ) T . We will now proceed briefly to discuss fixed point logics. For a detailed treatment, see Chapter8 of [7], whom we follow in our presentation.
Definition 10.
A formula ϕ is called positive in a variable x if x is in the scope of an even numberof negation symbols in ϕ .A formula in least fixed point logic (LFP) over a vocabulary R is defined inductively as follows:1. Any atomic second-order formula is an LFP formula.2. If ϕ is an LFP formula, then so is ¬ ϕ .3. If ϕ and ψ are LFP formulas, then so is ϕ ∨ ψ
4. If ϕ is an LFP formula, then so is ∃ xϕ for a first-order variable x .5. If ϕ is an LFP formula, then so is [LFP ~x,X ϕ ] ~t , where ϕ is positive in the second-order variable X and the lengths of the string of first-order variables ~x and the string of terms ~t coincide withthe arity of X .Fixpoint semantics have been used extensively in (logic) programming theory (see [8] for asurvey), and we will exploit this when relating the model theory of LFP to probabilistic logicprogramming below.We will first associate an operator with each LFP formula ϕ : Definition 11.
Let ϕ ( ~x, ~u, X, ~Y ) be an LFP formula, with the length of ~x equal to the arity of X ,and let ω be an R -structure with domain D . Let ~b and ~S be an interpretation of ~u and ~Y respectively.Then we define the operator F ϕ : P ( D k ) → P ( D k ) as follows: F ϕ ( R ) := n ~a ∈ D k | ω | = ϕ ( ~a,~b, R, ~S ) o . Since we have restricted Rule 5 in Definition 10 to positive formulas, F ϕ is monotone for all ϕ (i. e. R ⊆ F ϕ ( R ) for all R ⊆ D k ). Therefore we have: Fact 12.
For every
LFP formula ϕ ( ~x, ~u, X, ~Y ) and every R -structure on a domain D and inter-pretation of variables as in Definition 11, there is a relation R ⊆ D k such that R = F ϕ ( R ) andthat for all R ′ with R ′ = F ϕ ( R ′ ) we have R ⊆ R ′ . Definition 13.
We call the R from Fact 12 the least fixpoint of ϕ ( ~x, ~u, X, ~Y ) ow we are ready to define the semantics of least fixpoint logic: Definition 14.
By induction on the definition of an LFP formula, we will define when an LFPformula ϕ ( ~X, ~x ) is said to hold in an R -structure ω for a tuple ~a from the domain of ω and relations ~A of the correct arity:The first-order connectives and quantifiers ¬ , ∨ and ∃ as well as ∧ and ∀ defined from them inthe usual way are given the usual semantics.An atomic second order formula X ( ~x, ~c ) holds if and only if ( ~a, ~c ω ) ∈ A . [LFP ~x,X ϕ ] ~t holds if and only if ~a is in the least fixed point of F ϕ ( ~x,X ) .Before we relate them to other frameworks, we will recall the asymptotic quantifier eliminationresults known for first order logic and for least fixpoint logic. Our arguments will hinge on theasymptotic reduction of LFP to FOL in [1] and on the asymptotic quantifier elimination in FOL. Notation.
Although we have allowed second-order variables in the inductive definitions above, wewill assume from now on unless mentioned otherwise that LFP formulas do not have free second-order variables.The asymptotic theory of relational first-order logic is much studied and well-understood. It canbe summarised as follows (Chapter 4 of [7] is a good general reference):
Definition 15.
Let R be a relational vocabulary. Then the first order theory RANDOM( R ) isgiven by all axioms of the following form, called extension axioms over R : ∀ v ,...,v r ^ ≤ i RANDOM( R ) eliminates quantifiers, i. e. for each formula ϕ ( ~x ) there is a quantifier-freeformula ϕ ′ ( ~x ) such that RANDOM( R ) ⊢ ∀ ~x ( ϕ ( ~x ) ↔ ϕ ′ ( ~x )) . It is sometimes helpful to characterise this quantifier-free formula somewhat more explicitly: Proposition 17. Let ϕ ( ~x ) be a formula of first-order logic. Then:1. ϕ ′ ( ~x ) as in Fact 16 can be chosen such that only those relation symbols occur in ϕ ′ that occurin ϕ .2. If every atomic subformula of ϕ contains at least one free variable not in ~x , and no relationsymbol occurs with different variables in different literals, then either RANDOM( R ) ⊢ ∀ ~x ϕ ( ~x ) or RANDOM( R ) ⊢ ∀ ~x ¬ ϕ ( ~x ) .Proof. The first claim follows from the fact that RANDOM( T ) = RANDOM( R ) |T for any T ⊆ R .To show the second claim, consider the vocabulary ¯ R containing R ~x ( ~y ) for every atomic sub-formula R ( ~x, ~y ) of ϕ and let ¯ ϕ be the ¯ R obtained from ϕ by replacing every occurrence of R ( ~x, ~y ) with R ~x ( ~y ) . Let M be a model of RANDOM( R ) and let ~a ∈ M . Then define an ¯ R -structureon M by setting R ~x ( ~y ) : ⇔ R ( ~a, ~y ) . One can verify that M satisfies the extension axioms in RANDOM( ¯ R ) . Since RANDOM( ¯ R ) is complete, RANDOM( ¯ R ) ⊢ ¯ ϕ or RANDOM( ¯ R ) ⊢ ¬ ¯ ϕ . There-fore, either ϕ ( ~a ) or ¬ ϕ ( ~a ) holds uniformly for all ~a ∈ M . Therefore, either RANDOM( R ) ⊢ ∀ ~x ϕ ( ~x ) or RANDOM( R ) ⊢ ∀ ~x ¬ ϕ ( ~x ) .he importance of RANDOM( R ) comes from its role as the asymptotic limit of the class ofall R -structures. In fact, it axiomatises the limit theory of R -structures even when the individualprobabilities of relational atoms are given by q R rather than : Fact 18. lim n →∞ Q ( n ) T ( ϕ ) = 1 for all abstract distributions T over R and all extension axioms over R . The main theorem of [1] shows that RANDOM( R ) not only eliminates classical quantifiers, butalso least fixed point quantifiers: Fact 19. Let ϕ ( ~x ) be an LFP formula over R . Then there is a finite subset G of RANDOM( R ) and a quantifier-free formula ϕ ′ ( ~x ) such that G ⊢ ∀ ~x ϕ ( ~x ) ↔ ϕ ′ ( ~x ) . Putting this together, we can derive the following: Theorem 20. Let T be an abstract LFP distribution over a vocabulary R . Then T is asymptoticallyequivalent to a quantifier-free FOL distribution over R .Proof. By Fact 19 and the finiteness of the vocabulary V , there is a finite set G of extensions axiomsover R such that there are quantifier-free R -formulas φ ′ R for every R ∈ V\R with G ⊢ ∀ ~x φ R ( ~x ) ↔ φ ′ R ( ~x ) .By Fact 18, lim n →∞ Q ( n ) T ( { ω ∈ Ω n | ω R | = G } ) = 1 for any finite subset G ⊆ RANDOM( R ) andthus lim n →∞ Q ( n ) T ( { ω ∈ Ω n |∀ R ∈V\R ω R | = φ R ↔ φ ′ R } ) = 1 . Let ( Q ( n ) T ) be the family of distributionsinduced by the quantifier-free FO -distribution over R , in which every φ R is replaced by φ ′ R . Byconstruction, Q ( n ) ( ω ) = Q ′ ( n ) ( ω ) for every world ω with ∀ R ∈V\R ω R | = φ R ↔ φ ′ R . Therefore, sup A ⊆ Ω n | Q ( n ) ( A ) − Q ′ ( n ) ( A ) | is bounded by above by − Q ( n ) ( { ω ∈ Ω n |∀ R ∈V\R ω R | = φ R ↔ φ ′ R } ) ,which limits to 0 since lim n →∞ Q ( n ) ( { ω ∈ Ω n |∀ R ∈V\R ω R | = φ R ↔ φ ′ R } ) = 1 . In this section we will see that the distribution semantics at the heart of probabilistic logic pro-gramming can be seen as a special case of abstract distributions. We can then apply the conceptsdeveloped above to infer a description of their asymptotic behaviour as well as a complete syntacticcharacterisation of projectivity.Lastly, we will briefly sketch how to transfer the results of this section to other formalisms suchas the relational Bayesian network specifications from [5] and a subclass of the relational Bayesiannetwork specifications from [11]. In relating probabilistic logic programming to the abstract distribution semantics that we haveintroduced above, we will employ the simplification in [20] and consider a probabilistic logic programto be a stratified Datalog program over probabilistic facts. This distribution semantics covers severaldifferent equally expressive formalisms. See [20,19] for an overview.For an introduction to the syntax and semantics of stratified Datalog Programs in line with thispaper, see Chapter 9 of [7].e will use the notation ( Π, P ) ~t for an intensional symbol P of a stratified logic program Π tomean that “The program Π proves P~t ”. Definition 21. A probabilistic logic program consists of probabilistic facts and deterministic rules,where the deterministic part is a stratified Datalog program. We will consider it in our frameworkof abstract distribution semantics as follows: R is given by relation symbols R ′ for every probabilistic fact p R :: R ( ~x ) , with q R ′ := p R . Theirarity is just the arity of R . S is given by the vocabulary of the probabilistic logic program and additionally the R ′ in R .Let Π be the stratified Datalog program obtained by prefixing the program { R ′ ( ~x ) ← R ( ~x ) | R ′ ∈R} to the deterministic rules of the probbailistic logic program.Then φ P for a P ∈ S\R is given by ( Π, P ) ~t. The distribution semantics for probabilistic programming is related to the LFP distributionsemantics introduced above through the following fact, cf. Theorem 9.1.1 of [7]: Fact 22. For every stratifiable Datalog formula ( Π, P ) ~t as above, there exists an LFP formula ϕ ( ~t ) over the extensional vocabulary R of Π such that for every R -structure A and every interpretationof variables on A , A | = ϕ ( ~t ) if and only if A | = ( Π, P ) ~t .Remark. In fact, it suffices to consider formulas in the so-called bounded fixpoint logic, whoseexpressiveness lies between first order logic and least fixed point logic; see [7] for details.This translation allows us to apply the asymptotic quantifier elimination results from Section 2to probabilistic logic programming.In order to obtain a characterisation within probabilistic logic programming, however, we needto translate quantifier-free first order formulas back to stratifiable Datalog.In fact, they can be mapped to a subset of stratified Datalog that is well-known from logicprogramming: Definition 23. A Datalog program, Datalog formula or probabilistic logic program is called range-restricted if every variable occurring in the body of a clause also occurs in the head of that clause.This property corresponds exactly to the fragment of probabilistic logic programs shown to beprojective by Jaeger and Schulte [12] (Proposition 4.3 there): Proposition 24. Every range-restricted probabilistic logic program is projective. The proof of Theorem 9.1.1 of [7] gives the following: Fact 25. Every quantifier-free first order formula is equivalent to a range-restricted stratified Dat-alog formula . Now we have all the ingredients to formulate the main result of this subsection. Theorem 26. Every probabilistic logic program is asymptotically equivalent to a range-restrictedprobabilistic logic program.Proof. Let S\R be the extensional vocabulary of the probabilistic logic program Θ and let Π beits underlying Datalog program. Then for every relation R ∈ S\R , R ( ~t ) is given by the Datalogformula ( Π, R ) ~t over any given R -structure. By Fact 22, ( Π, R ) ~t is equivalent to an LFP formula φ R ver R . Let T be the abstract LFP distribution over R in which for every R ∈ R q R is taken from Θ and for every R ∈ S\R , this φ R is used. Then T and Θ induce equivalent families of distributions.By Theorem 20, T is asymptotically equivalent to a quantifier-free abstract distribution, which inturn is equivalent to a range-restricted Datalog probabilistic logic program by Fact 25. Therefore Θ itself is asymptotically equivalent to a range-restricted probabilistic logic program. Corollary 27. A probabilistic logic program is projective if and only if it is equivalent to a range-restricted probabilistic logic program.Proof. By Theorem 26 every (projective) probabilistic logic program is asymptotically equivalentto a range-restricted probabilistic logic program (which is itself projective by Proposition 24). ByProposition 4, they are therefore actually equivalent. While probabilistic logic programs are just one of several formalisms of statistical relational AI, wecan relate our framework to those based on knowledge based model construction. This relation iswell-known in the literature (see e. g. [20,19]), and so we will not go into details.First, however, we will clarify how Cozman and Maua’s [5] relational Bayesian network specifi-cations correspond to abstract FOL distributions. Recall from there that such a relational Bayesiannetwork specification differs from an abstract FOL distribution in that the φ R are allowed to men-tion any symbols from S rather than just symbols from R . However, it is then stipulated that thedependency graph induced by the φ R must be acyclic. In that case, every relation R has a well-defined rank n ∈ N and only refers to relations of lower rank than them. Relations in R have rank . Then we can iteratively unfold the relations in φ R , where R is of rank n + 1 , by replacing anyoccurence of R ′ of rank < rank( R ′ ) < n by their definition in terms of relations in R . Thus, any φ R can be equivalently expressed using only symbols from R .As the name suggests, relational Bayesian network specifications are well-suited for expressingBayesian networks. While the translation is most straightforward for ground Bayesian networks, onecan also express relational Bayesian networks (sensu [11]), as long as their lifted dependency graphis acyclic and they only use the noisy-or combination function. This latter constraint is due to thecorrespondence of the noisy-or combintion function to existential quantification over independentprobabilistic facts; see [19] for a discussion. As probailistic dependencies of higher rank need to beencoded by new probabilistic facts (much like the interpretation of probabilistic rules in Problog),we generally need to expand the language. Overall, we obtain: Proposition 28. Let T be a relational Bayesian network on vocabulary S with an acyclic depen-dency graph which only uses the noisy-or combination function. Then there is an S ′ ⊇ S and anabstract existential FOL distribution T ′ over an R ⊆ S ′ with vocabulary S ′ such that the reduct of T ′ to S is equivalent to T . Since asymptotic equivalence is preserved under reduct (Proposition 6), we can apply Theo-rem 20 to such relational Bayesian networks and conclude that they are asymptotically equivalentto (reducts of) quantifier-free FOL distributions. In order to complete the characterisation, wenote that such quantifier-free distributions correspond to Bayesian networks without combinationfunctions. Therefore, we obtain: roposition 29. Let T be a relational Bayesian network on vocabulary S with an acyclic depen-dency graph which only uses the noisy-or combination function. Then T is asymptotically eqivalentto a relational Bayesian network with acyclic dependency graph which does not use any combinationfunction. Since the relational Bayesian networks without combination functions are exactly those seen tobe projective in [12], the characterisation of projective distributions also carries over to this setting: Proposition 30. A relational Bayesian network with an acyclic dependency graph using only thenoisy-or combination function is projective if and only if it is equivalent to a relational Bayesiannetwork which does not use any combination function. Note also that while probabilistic logic programs (which correspond to abstract (bounded) LFPdistributions) are generally more expressive than relational Bayesian networks with acyclic depen-dency graphs and the noisy-or combination function (which can be characterised by abstract FOLdistributions), this gain in expressiveness disappears when considering only the projective fragmentof each. The results have immediate consequences for the expressiveness of probabilistic logic programmingand the other formalisms described above.We discuss two particularly striking ones here: Asymptotic loss of information Very insightful is the case of a probabilistic rule , i.e. a clausalformula annotated with a probability. Because of its intuitive appeal, this is a widely used syntacticelement of probabilistic logic programming languages such as Problog, and its semantics is definedby introducing a new probabilistic fact to model the uncertainty of the rule. More precisely: p :: R ( ~x ) : − Q ( ~x , ~y ) , . . . , Q n ( ~x n , ~y n ) (where ~x are the variables appearing in R , ~x i ⊆ ~x )is interpreted as p :: I ( ~x, ~y ); R ( ~x ) : − Q ( ~x , ~y ) , . . . , Q n ( ~x n , ~y n ) , I ( ~x, ~y ) (where ~y := S ~y i ).It is now easy to see from Proposition 17 that in the asymptotic quantifier-free representationof this probabilistic rule, I will no longer occur, since it originally occured implicitly quantified inthe body of the clause. However, I was the only connection between the probability annotationof the rule and its semantics! Therefore, the asymptotic probability of R ( ~x ) is independent of theprobability assigned to any non-range-restricted rule with R ( ~x ) as its head.A similar argument holds in noisy-or Bayesian networks whenever the noisy-or is invoked. Expressing projective families of distributions Our results also show how few of the projective fami-lies of distributions can be expressed in those formalisms. This confirms the suspicion voiced in [13]that despite the ostensible similarities between languages such as Independent Choice Logic, whichare based on the distribution semantics, and the array representation of [13], a direct applicationof techniques from probabilistic logic programming to general projective families of distributionsmight prove challenging.We show that already in the very limited signature consisting of a single unary relation symbol R ,there is no probabilistic logic program that induces the distribution that is uniform on isomorphismclasses of structures: efinition 31. Let S := { R } consist of one unary predicate, and let m ∗ be the family of distribu-tions on S -structures defined by m ∗ ( { ω } ) := | D |∗ N ω for a world ω ∈ Ω D , where N ω := |{ ω ′ ∈ Ω D | ω ∼ = ω ′ }| .This gives each isomorphism type of structures equal weight, and then within each isomorphismtype every world is given equal weight too. m ∗ is an important probability measure for two reasons; it plays a special role in finite modeltheory since the so-called unlabelled 0-1 laws are introduced with respect to this measure. Further-more, it was introduced explicitly by Carnap [3,4] as a candidate measure for formalising inductivereasoning, as part of the so-called continuum of inductive methods (see also [17] for a modernexposition).It is easily seen to be exchangeable; it is also projective, and in fact an elementary calculationshows that for any domain D and any { a , . . . a n +1 } ⊆ D , m ∗ (cid:16) R ( a n +1 ) | { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I (cid:17) = | I | + 1 n + 1 (4.1)(see any of the sources above for a derivation). Proposition 32. Let S ′ be a finite vocabulary extending S from Definition 31. Then there isno probabilistic logic program with vocabulary S ′ such that the reduct of the induced family ofdistributions to S is equal to m ∗ .Proof. For the sake of simplicity, we will assume that S ′ has no constant symbols. Since m ∗ is pro-jective, it would have to be induced by a range-restricted probabilistic logic program. In particular,in any grounding, any probabilistic fact appearing both in the body of a clause with head R ( a n +1 ) and in the body of a clause with head R ( a i ) for an i < n + 1 must be nullary. Since there are onlyfinitely many nullary predicates in S ′ , there are only finitely many possible configurations of thosenullary predicates. For every such configuration ϕ , let q ϕ be the conditional probability of R ( x ) given ϕ (This is well-defined since there are no constants in the language). We observe from Equation4.1 that for variable n ., the infimum of m ∗ (cid:16) R ( a n +1 ) | { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I (cid:17) is0, even if we assume that there is at least one i with R ( a i ) . In that situation, the conditional prob-ability of any ϕ with q ϕ = 0 given { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I is 0. However, sincethere are only finitely many configurations of nullary predicates ϕ , the infimum c of the nonzero q ϕ is greater than 0. Since the nullary predicate symbols are the only facts appearing in the bodies ofclauses with heads R ( a n +1 ) and clauses with heads R ( a i ) for an i < n + 1 , R ( a n +1 ) is conditionallyindependent of { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I given those nullary predicates. Thus, astandard calculation reveals that the conditional probability of R ( a n +1 ) given { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪{¬ R ( a i ) } i ∈{ ,...,n }\ I is a weighted mean of the non-zero q ϕ and therefore bounded below by c > , incontradiction to 0 being the infimum of m ∗ (cid:16) R ( a n +1 ) | { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I (cid:17) . Remark. In fact, the proposition easily generalises to all non-extreme probability functions on thecontinuum of inductive methods.Furthermore, allowing a finite number of constant symbols does not affect the argument of theproof. .2 Complexity results A natural question would be the complexity of determining an asymptotically equivalent range-restricted program for any given probabilistic logic program. Since this operation takes a non-groundprobabilistic logic program as input and computes another probabilistic logic program, the notionof data complexity does not make sense in this context. Instead, program complexity would be anappropriate measure.In our context, the input program could be measured in different ways. Since our analysis is basedon the setting of abstract distributions, we will be considering as our input abstract distributionsobtained from (stratified) probabilistic logic programs . We will furthermore fix our signatures R and S . Since the transformation acts on each φ R in turn and independently, it suffices to considerthe individual φ R as input. It is natural to ask about complexity in the length of φ R .In fact, one can extract upper and lower bounds from [1] (building on [9]), whose asymptoticresults form the core of our present work. The task of determining whether the probability of afirst-order sentence converges to 0 or 1 with increasing domain size, which is a special case of ourtransformation, is complete in PSPACE (Theorem 1.4 of [1]). Therefore the program transformationis certainly PSPACE-hard. On the other hand, asymptotic elimination of quantifiers in least fixedpoint logic is complete in EXPTIME (Theorems 4.1, 4.3 of [1]), so the program transformation iscertainly in EXPTIME.In order to specify further, we note that for abstract FO distributions, which correspond toacyclic probabilistic logic programs, the transformation can be performed in PSPACE:Let R be of arity n . Then enumerate the (finitely many) quantfier-free n -types ( ϕ i ) in R . Nowfor any φ R of arity n we can check succesively in polynomial space in the length of φ R , whether theprobability of ϕ i → φ R converges to 0 or 1. Then φ R is asymptotically equivalent to the conjunctionof those quantifier-free n -types for which 1 is returned.In the general case of least fixed point logic, Blass et al. [1] show that the problem of findingan asymptotically equivalent first-order sentence is EXPTIME complete. However, to representstratified Datalog, only the fragment known as bounded or stratified least fixed point logic is required(see Sections 8.7 and 9.1 of [7]). Therefore, the complexity class of the program transformation ofstratified probabilistic logic programs corresponds to the complexity of the asymptotic theory of bounded fixed point logic, which to the best of our knowledge is still open. The analysis presented here suggests several strands of further research.While some widely used directed frameworks can be subsumed under the probabilistic logicprogramming paradigm as discussed in Subsection 3.2, undirected models such as Markov logicnetworks (MLNs) seem to require a different approach. Indeed, the projective fragment of MLNsisolated by [12] is particularly restrictive, since it only allows formulas in which every literal has thesame variables ( the σ -determinate formulas of [6]; cf. also the parametric classes of finite modeltheory, for instance in Section 4.2 of [7]). It might therefore be expected that if an analogous resultto Theorem 26 holds for MLNs, they could express even fewer projective families of distributionsthan probabilistic logic programs.Beyond the FOL or LFP expressions used in current probabilistic logic programming, anotherdirection is to explore languages with more expressive power. Candidates for this are for instance thelogic with probablity quantifiers from [14] or the conditional probability logic from [16]. Appropriatesymptotic quantifier elimination results have been shown in both settings [16,15], but their impactfor use as a representation language has yet to be evaluated.Furthermore, as mentioned in [19], a different logical framework (such as that described in [10])or an incorporation of second-order elements (such as described in [2]) could enable the expression ofmore varied combination functions. Therefore, investigating the asymptotic theory of such extendedlogics could have direct consequences for the study of the expressiveness of a broader class ofknowledge-based model construction frameworks.Finally, the failure of the classical paradigms under investigation to express general projectivefamilies of distributions suggests one must look beyond the current methods and statistical relationalframeworks to address the challenge of learning and inference for general projective families ofdistributions issued by [13]. References 1. Blass, A., Gurevich, Y., Kozen, D.: A zero-one law for logic with a fixed-point operator. Inf. Control. (1-3), 70–90 (1985). https://doi.org/10.1016/S0019-9958(85)80027-92. Bry, F.: In praise of impredicativity: A contribution to the formalization of meta-programming. TheoryPract. Log. Program. (1), 99–146 (2020)3. Carnap, R.: Logical Foundations of Probability. University of Chicago Press (1950)4. Carnap, R.: The Continuum of Inductive Methods. University of Chicago Press (1952)5. Cozman, F.G., Mauá, D.D.: The finite model theory of bayesian network specifications: De-scriptive complexity and zero/one laws. Int. J. Approx. Reason. , 107–126 (2019).https://doi.org/10.1016/j.ijar.2019.04.003. URL https://doi.org/10.1016/j.ijar.2019.04.003 6. Domingos, P.M., Singla, P.: Markov logic in infinite domains. In: L.D. Raedt, T.G. Dietterich,L. Getoor, K. Kersting, S. Muggleton (eds.) Probabilistic, Logical and Relational Learning - AFurther Synthesis, 15.04. - 20.04.2007, Dagstuhl Seminar Proceedings , vol. 07161. InternationalesBegegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany (2007). URL http://drops.dagstuhl.de/opus/volltexte/2008/1381 7. Ebbinghaus, H., Flum, J.: Finite model theory, Second Edition. Springer Monographs in Mathematics.Springer (2006)8. Fitting, M.: Fixpoint semantics for logic programming a survey. Theor. Comput.Sci. (1-2), 25–51 (2002). https://doi.org/10.1016/S0304-3975(00)00330-3. URL https://doi.org/10.1016/S0304-3975(00)00330-3 9. Grandjean, E.: Complexity of the first-order theory of almost all finite structures. Inf.Control. (2/3), 180–204 (1983). https://doi.org/10.1016/S0019-9958(83)80043-6. URL https://doi.org/10.1016/S0019-9958(83)80043-6 10. Hommersom, A., Lucas, P.J.F.: Generalising the interaction rules in probabilistic logic. In:T. Walsh (ed.) IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artifi-cial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pp. 912–917. IJCAI/AAAI (2011).https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-15811. Jaeger, M.: Relational bayesian networks. In: D. Geiger, P.P. Shenoy (eds.) UAI ’97: Proceed-ings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Brown University,Providence, Rhode Island, USA, August 1-3, 1997, pp. 266–273. Morgan Kaufmann (1997). URL https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=320&proceeding_id=13 12. Jaeger, M., Schulte, O.: Inference, learning, and population size: Projectivity for SRL models. CoRR abs/1807.00564 (2018)13. Jaeger, M., Schulte, O.: A complete characterization of projectivity for statistical relational models.In: C. Bessiere (ed.) Proceedings of the Twenty-Ninth International Joint Conference on ArtificialIntelligence, IJCAI 2020, pp. 4283–4290. ijcai.org (2020). https://doi.org/10.24963/ijcai.2020/5914. Keisler, H.J.: Probability quantifiers. In: Model-theoretic logics, Perspect. Math. Logic, pp. 509–556.Springer, New York (1985)15. Keisler, H.J., Lotfallah, W.B.: Almost everywhere elimination of probability quantifiers. J. Symb. Log. (4), 1121–1142 (2009). https://doi.org/10.2178/jsl/125474868316. Koponen, V.: Conditional probability logic, lifted bayesian networks, and almost sure quantifier elim-ination. Theor. Comput. Sci. , 1–27 (2020). https://doi.org/10.1016/j.tcs.2020.08.006. URL https://doi.org/10.1016/j.tcs.2020.08.006 17. Paris, J., Vencovská, A.: Pure inductive logic. Perspectives in Logic. Associa-tion for Symbolic Logic, Ithaca, NY; Cambridge University Press, Cambridge (2015).https://doi.org/10.1017/CBO9781107326194. URL https://doi.org/10.1017/CBO9781107326194 18. Poole, D., Buchman, D., Kazemi, S.M., Kersting, K., Natarajan, S.: Population size extrap-olation in relational probabilistic modelling. In: U. Straccia, A. Calì (eds.) Scalable Uncer-tainty Management - 8th International Conference, SUM 2014, Oxford, UK, September 15-17,2014. Proceedings, Lecture Notes in Computer Science , vol. 8720, pp. 292–305. Springer (2014).https://doi.org/10.1007/978-3-319-11508-5_2519. Raedt, L.D., Kimmig, A.: Probabilistic (logic) programming concepts. Mach. Learn.100