[PDF] An asymptotic analysis of probabilistic logic programming with implications for expressing projective families of distributions

Abstract

Over the last years, there has been increasing research on the scaling behaviour of statistical relational representations with the size of the domain, and on the connections between domain size dependence and lifted inference. In particular, the asymptotic behaviour of statistical relational representations has come under scrutiny, and projectivity was isolated as the strongest form of domain size independence. In this contribution we show that every probabilistic logic program under the distribution semantics is asymptotically equivalent to a probabilistic logic program consisting only of determinate clauses over probabilistic facts. To facilitate the application of classical results from finite model theory, we introduce the abstract distribution semantics, defined as an arbitrary logical theory over probabilistic facts to bridge the gap to the distribution semantics underlying probabilistic logic programming. In this representation, determinate logic programs correspond to quantifier-free theories, making asymptotic quantifier results avilable for use. We can conclude that every probabilistic logic program inducing a projective family of distributions is in fact captured by this class, and we can infer interesting consequences for the expressivity of probabilistic logic programs as well as for the asymptotic behaviour of probabilistic rules.

Full PDF

aa r X i v : . [ c s . L O ] F e b An asymptotic analysis of probabilistic logic programmingwith implications for expressing projective families ofdistributions

Felix Weitkämper

Institut für Informatik, Ludwig-Maximilians-Universität München, Munich, Germany,[email protected]

Abstract.

Over the last years, there has been increasing research on the scaling behaviourof statistical relational representations with the size of the domain, and on the connectionsbetween domain size dependence and lifted inference. In particular, the asymptotic behaviourof statistical relational representations has come under scrutiny, and projectivity was isolatedas the strongest form of domain size independence.In this contribution we show that every probabilistic logic program under the distributionsemantics is asymptotically equivalent to a probabilistic logic program consisting only ofrange-restricted clauses over probabilistic facts. To facilitate the application of classical re-sults from ﬁnite model theory, we introduce the abstract distribution semantics, deﬁned as anarbitrary logical theory over probabilistic facts to bridge the gap to the distribution seman-tics underlying probabilistic logic programming. In this representation, range-restricted logicprograms correspond to quantiﬁer-free theories, making asymptotic quantiﬁer results avilablefor use. We can conclude that every probabilistic logic program inducing a projective familyof distributions is in fact captured by this class, and we can infer interesting consequencesfor the expressivity of probabilistic logic programs as well as for the asymptotic behaviour ofprobabilistic rules.

Statistical relatonal artiﬁcial intelligence has emerged over the last 25 years as a means to specifystatistical models for relational data. Since then, many diﬀerent frameworks have been devlopedunder this heading, which can broadly been classiﬁed into those who extend logic programmingto incorporate probabilistic information (probabilistic logic programming under the distributionsemantics) and those who specify an abstract template for probabilistic graphical models (sometimesknown as knowledge-based model construction).They both share the distinction between a general model (a template or a probabilistic logicprogram with variables) and a speciﬁc domain used to ground the model. Ideally, the model wouldbe speciﬁed abstractly and independently of a speciﬁc domain, even though a speciﬁc domain maywell have been involved in learning the model from data.However, a signiﬁcant hurdle is the generally hard to predict or undesirable behaviour of themodel when applied to domains of diﬀerent sizes. This scaling problem has received much attentionin the past years (see [18], for instance and the references in [13]) and recently Jaeger and Schulte[12,13] have identiﬁed projectivity as a strong form of good scaling behaviour: in a projective model,he probability of a given property holding for a given object in the domain is completely inde-pendent of the domain size. However, the examples in [18] show that projectivity cannot be hopedfor in general statistical relational models, and [12] identify very restrictive fragments of commonstatistical relational frameworks as projective.The question remains, however, in what way those fragments are not merely suﬃcient, butnecessary criteria for a statistical relational model to be projective. This has implications for whichprojective distributions can be expressed by current statistical relational frameworks at all, sincethe limited fragments from [12] are clearly insuﬃcient to express the whole variety of projectivedistributions characterised in [13].We will show in this contribution that in the case of probabilistic logic programming under thedistribution semantics (and those other paradigms that can be reduced to that), the restrictive frag-ment identiﬁed in [12] (which corresponds to range-restricted probabilistic logic programs) is indeedsuﬃcient, in the sense that every projective probabilistic logic program is equivalent to a range-restricted probabilistic logic program. Our method will show that, moreover, every probabilisticlogic program is asymptotically equivalent to a range-restricted probabilistic logic program.This will be an application of an asymptotic quantiﬁer elimination result for probabilistic logicprogramming derived from classical ﬁnite model theory, namely from the study of the asymptotictheory of ﬁrst-order and least ﬁxed point logic in the 1980s (particularly in the study of 0-1 laws,applied in the particular form of [1]).This application is also methodologically interesting as it opens another way in which classicalmathematical logic can contribute to studying cutting-edge problems in learning and reasoning.That the theory developed around 0-1 laws would be a natural candidate for such investigationsmay not surprise, as it is highly developed and is itself in the spirit of “ﬁnite probabilistic modeltheory” (cf. Section 7 of [5]), and one might hope for more cross-fertilisation between the two ﬁeldsin the future.

We will introduce projective families of distributions in accordance with [12,13], where one canﬁnd a much more detailed exposition of the terms and their background. As we are interested instatistical relational representations as a means of abstracting away from a given ground model, wewill refer to families of distributions with varying domain sizes.

Deﬁnition 1. A family of distributions for a relational signature S is a sequence (cid:0) Q ( n ) (cid:1) n ∈ N ofprobability distributions on the set Ω n of all S -structures with domain { , . . . , n } ⊆ N .We will carry on to deﬁne exchangeability and projectivity; however, unlike [12,13], which haveincluded exchangeability in projectivity “for convenience” ([12], p. 3), we consider projectivity sep-arately.This would allow us to have constants with a ﬁxed interpretation in our probabilistic logicprograms, as long as n is large enough for all interpreted constants to be included in { , . . . , n } : Deﬁnition 2.

A family of distributions is called exchangeable if it is invariant under S -isomorphism.It is called projective if for all m < n ∈ N and all ω ∈ Ω m the following holds: Q ( m ) ( { ω } ) = Q ( n ) ( { ω ′ ∈ Ω n | ω is the substructure of ω ′ with domain { , . . . , n } } ) n the remainder of this paper, we will investigate the interplay between the asymptotic be-haviour of logical theories as they have been studied in ﬁnite model theory and the families ofdistributions that are induced by them. We therefore introduce a notion of asymptotic equivalenceof families of distributions. Deﬁnition 3.

Two families of distributions ( Q ( n ) ) and ( Q ′ ( n ) ) are asymptotically equivalent if lim n →∞ sup A ⊆ Ω n | Q ( n ) ( A ) − Q ′ ( n ) ( A ) | = 0 Remark.

In measure theoretic terms, the families of distributions ( Q ( n ) ) and ( Q ′ ( n ) ) are asymptot-ically equivalent if and only if the limit of the total variation diﬀerence between them is .We will later also refer to logical theories as being asymptotically equivalent if they induceasymptotically equivalent families of distributions. Proposition 4.

Two projective families of distributions are asymptotically equivalent if and onlyif they are equal.Proof.

We will proceed by contradictï¿œon. So assume not. Then there is an m such that Q ( m ) and Q ′ ( m ) are not equal. Let ω be a world of size m which does not have the same probabilityin Q ( m ) and Q ′ ( m ) . Let a := | Q ( m ) ( { ω } ) − Q ′ ( m ) ( { ω } ) | For any n ≥ m , consider the subset A n := { ω ∈ Ω n | ω ↓ [ m ] = ω } . Since both families are projective, | Q ( n ) ( A n ) − Q ′ ( n ) ( A n ) | = | Q ( m ) ( { ω } ) − Q ′ ( m ) ( { ω } ) | = a . Therefore, ( Q ( n ) ) and ( Q ′ ( n ) ) are not asymptotically equivalent.Since we have to expand the vocabulary to represent theories in a distribution semantics, wewill note here that asymptotic equivalence is preserved under reduct. First we have to clarify howwe build reducts of distributions in the ﬁrst place: Deﬁnition 5.

Let Q ( n ) be a distribution over a signature S . Then its reduct Q ( n ) T to a subsignature T ⊆ S is deﬁned such that for any world ω ∈ Ω T n , Q ( n ) T ( ω ) := Q ( n ) ( { ω ′ ∈ Ω S n | ω ′T = ω } ) . Remark. Q ( n ) T is the pushforward measure of Q ( n ) with respect to the reduct projection from Ω S n → Ω T n .We can now formulate preservation of asymptotic equivalence under reducts: Proposition 6.

The reducts of asymptotically equivalent families of distributions are themselvesasymptotically equivalent.Proof.

Let ( Q ( n ) ) and ( Q ′ ( n ) ) be asymptotically equivalent families of distributions over S . Thenfor any T ⊆ S , and any A ⊆ Ω n , | Q ( n ) T ( A ) − Q ′ ( n ) T ( A ) | = | Q ( n ) ( { ω ∈ Ω S n | ω T ∈ A } ) − Q ′ ( n ) ( { ω ∈ Ω S n | ω T ∈ A } ) | . Therefore, lim n →∞ sup A ⊆ Ω T n | Q ( n ) T ( A ) − Q ′ ( n ) T ( A ) | ≤ lim n →∞ sup A ⊆ Ω S n | Q ( n ) ( A ) − Q ′ ( n ) ( A ) | = 0 In the following section, we will then introduce the tools from ﬁnite model theory that will help us tounderstand the asymptotic behaviour of probabilistic logic programs later. We will ﬁrst lay out ourbstract framework and introduce least ﬁxpoint logic, which will prove an adequate representationfor (probabilistic) logic programs. We will then give the necessary background from ﬁnite modeltheory and state the main classical results on the asymptotic behaviour of least ﬁxpoint logic.In Section 3, we will display the relationship between the theory developed in Section 2 andprobabilistic logic programming, which will enable us to prove that every probabilistic logic programis asymptotically equivalent to a range-restricted probabilistic logic program and that thereforeby Proposition 4 every projective logic program is actually equivalent to a range-restricted logicprogram. We will conclude that section by discussing how this framework relates to the setting of[5] and what that means for some formalisms of knowledge-based model construction.In Section 4, we highlight some implications of our ﬁndings for the asymptotic behaviour ofprobabilistic rules and for the limited expressivity of probabilistic logic programming for generalprojective distributions, and we give a brief ﬁrst discussion of questions of complexity.Finally, we conclude the paper with some ideas for further research.

Our tools will come from ﬁnite model theory, and we require a bridge to link the study of theasymptotic properties of logical theories with the statistical relational formalisms.Cozman and Maua [5] deﬁne what they call relational Bayesian network speciﬁcations , whichcombine random and independent root predicates with non-root predicates that are deﬁned ul-timately from root predicates using ﬁrst-order deﬁnitions. Since we aim to cover both relationalBayesian networks and logic programming based methods, we will slightly generalise their approachand call the resulting formalism abstract distribution semantics.

In particular, we will generalise away from ﬁrst-order logic (FOL) to a general logical language:

Deﬁnition 7.

Let R be a vocabulary. Then a logical language L ( R ) consists of a collection of formulas ϕ , each of which deﬁne a function which takes an R -structure M and a tuple from M n for some n ∈ N (called the arity of ϕ ) and returns either “false” or “true”. We write M | = ϕ ( ~a ) whenever “true” is returned for M and ~a .The archetype of a logical language is the ﬁrst-order predicate calculus, but the concept asdeﬁned here is suﬃciently general to accommodate many other choices. Deﬁnition 8.

Let V be a relational vocabulary (possibly with constants), R ⊆ S , and let L ( R ) bea logical language over R . Then an abstract L -distribution over R (with vocabulary S ) consists ofthe following data:For every R ∈ R a number q R ∈ Q ∩ [0 , .For every R ∈ S\R , an L ( R ) -formula φ R of the same arity as R .In the following we will assume that all vocabularies are ﬁnite. As we will see in Subsection3.2 below, an abstract FOL-distribution has the same expressivity as the corresponding relationalBayesian network speciﬁcation from [5]. However, in our analysis of methods based on logic pro-gramming, we will also look at other choices such as least ﬁxed point logic, which has been shownto be an appropriate logical framework to formalise Datalog queries.The semantics of an abstract distribution is only deﬁned relative to a domain D , which we willalso assume to be ﬁnite. The formal deﬁnition is as follows: eﬁnition 9. Let L ( R ) be a logical language over R and let D be a ﬁnite set. Let T be an abstract L -distribution over R . Let Ω D be the set of all R -structures with domain D .Then the probability distribution on Ω D induced by T , written Q ( D ) T , is deﬁned as follows:For all ω ∈ Ω D , if ∃ ~a ∈ ~D ∃ R ∈V\R : R ( ~a ) < φ R ( ~a ) , then Q ( D ) T ( { ω } ) := 0 Otherwise, Q ( D ) T ( { ω } ) := Q R ∈R ( q |{ ~a ∈ ~D | R ( ~a ) }| R ) × Q R ∈R (1 − q R ) |{ ~a ∈ ~D |¬ R ( ~a ) }| In other words, all the relations in R are independent with probability q R and the relations in V\R are deﬁned deterministically by the L ( R ) -formulas φ R .We will use the following notational shorthand: Notation. Q ( D ) T ( ϕ ) := Q ( D ) T ( { ω ∈ Ω D | ω | = ϕ } ) Since we are only considering ﬁnite domains, we can assume without loss of generality that everydomain is given by an initial segment of N . We will use the notation Q ( n ) T for Q ( { ,...,n } ) T . We will now proceed brieﬂy to discuss ﬁxed point logics. For a detailed treatment, see Chapter8 of [7], whom we follow in our presentation.

Deﬁnition 10.

A formula ϕ is called positive in a variable x if x is in the scope of an even numberof negation symbols in ϕ .A formula in least ﬁxed point logic (LFP) over a vocabulary R is deﬁned inductively as follows:1. Any atomic second-order formula is an LFP formula.2. If ϕ is an LFP formula, then so is ¬ ϕ .3. If ϕ and ψ are LFP formulas, then so is ϕ ∨ ψ

4. If ϕ is an LFP formula, then so is ∃ xϕ for a ﬁrst-order variable x .5. If ϕ is an LFP formula, then so is [LFP ~x,X ϕ ] ~t , where ϕ is positive in the second-order variable X and the lengths of the string of ﬁrst-order variables ~x and the string of terms ~t coincide withthe arity of X .Fixpoint semantics have been used extensively in (logic) programming theory (see [8] for asurvey), and we will exploit this when relating the model theory of LFP to probabilistic logicprogramming below.We will ﬁrst associate an operator with each LFP formula ϕ : Deﬁnition 11.

Let ϕ ( ~x, ~u, X, ~Y ) be an LFP formula, with the length of ~x equal to the arity of X ,and let ω be an R -structure with domain D . Let ~b and ~S be an interpretation of ~u and ~Y respectively.Then we deﬁne the operator F ϕ : P ( D k ) → P ( D k ) as follows: F ϕ ( R ) := n ~a ∈ D k | ω | = ϕ ( ~a,~b, R, ~S ) o . Since we have restricted Rule 5 in Deﬁnition 10 to positive formulas, F ϕ is monotone for all ϕ (i. e. R ⊆ F ϕ ( R ) for all R ⊆ D k ). Therefore we have: Fact 12.

For every

LFP formula ϕ ( ~x, ~u, X, ~Y ) and every R -structure on a domain D and inter-pretation of variables as in Deﬁnition 11, there is a relation R ⊆ D k such that R = F ϕ ( R ) andthat for all R ′ with R ′ = F ϕ ( R ′ ) we have R ⊆ R ′ . Deﬁnition 13.

We call the R from Fact 12 the least ﬁxpoint of ϕ ( ~x, ~u, X, ~Y ) ow we are ready to deﬁne the semantics of least ﬁxpoint logic: Deﬁnition 14.

By induction on the deﬁnition of an LFP formula, we will deﬁne when an LFPformula ϕ ( ~X, ~x ) is said to hold in an R -structure ω for a tuple ~a from the domain of ω and relations ~A of the correct arity:The ﬁrst-order connectives and quantiﬁers ¬ , ∨ and ∃ as well as ∧ and ∀ deﬁned from them inthe usual way are given the usual semantics.An atomic second order formula X ( ~x, ~c ) holds if and only if ( ~a, ~c ω ) ∈ A . [LFP ~x,X ϕ ] ~t holds if and only if ~a is in the least ﬁxed point of F ϕ ( ~x,X ) .Before we relate them to other frameworks, we will recall the asymptotic quantiﬁer eliminationresults known for ﬁrst order logic and for least ﬁxpoint logic. Our arguments will hinge on theasymptotic reduction of LFP to FOL in [1] and on the asymptotic quantiﬁer elimination in FOL. Notation.

Although we have allowed second-order variables in the inductive deﬁnitions above, wewill assume from now on unless mentioned otherwise that LFP formulas do not have free second-order variables.The asymptotic theory of relational ﬁrst-order logic is much studied and well-understood. It canbe summarised as follows (Chapter 4 of [7] is a good general reference):

Deﬁnition 15.

Let R be a relational vocabulary. Then the ﬁrst order theory RANDOM( R ) isgiven by all axioms of the following form, called extension axioms over R : ∀ v ,...,v r  ^ ≤ i

RANDOM( R ) eliminates quantiﬁers, i. e. for each formula ϕ ( ~x ) there is a quantiﬁer-freeformula ϕ ′ ( ~x ) such that RANDOM( R ) ⊢ ∀ ~x ( ϕ ( ~x ) ↔ ϕ ′ ( ~x )) . It is sometimes helpful to characterise this quantiﬁer-free formula somewhat more explicitly:

Proposition 17.

Let ϕ ( ~x ) be a formula of ﬁrst-order logic. Then:1. ϕ ′ ( ~x ) as in Fact 16 can be chosen such that only those relation symbols occur in ϕ ′ that occurin ϕ .2. If every atomic subformula of ϕ contains at least one free variable not in ~x , and no relationsymbol occurs with diﬀerent variables in diﬀerent literals, then either RANDOM( R ) ⊢ ∀ ~x ϕ ( ~x ) or RANDOM( R ) ⊢ ∀ ~x ¬ ϕ ( ~x ) .Proof. The ﬁrst claim follows from the fact that

RANDOM( T ) = RANDOM( R ) |T for any T ⊆ R .To show the second claim, consider the vocabulary ¯ R containing R ~x ( ~y ) for every atomic sub-formula R ( ~x, ~y ) of ϕ and let ¯ ϕ be the ¯ R obtained from ϕ by replacing every occurrence of R ( ~x, ~y ) with R ~x ( ~y ) . Let M be a model of RANDOM( R ) and let ~a ∈ M . Then deﬁne an ¯ R -structureon M by setting R ~x ( ~y ) : ⇔ R ( ~a, ~y ) . One can verify that M satisﬁes the extension axioms in RANDOM( ¯ R ) . Since RANDOM( ¯ R ) is complete, RANDOM( ¯ R ) ⊢ ¯ ϕ or RANDOM( ¯ R ) ⊢ ¬ ¯ ϕ . There-fore, either ϕ ( ~a ) or ¬ ϕ ( ~a ) holds uniformly for all ~a ∈ M . Therefore, either RANDOM( R ) ⊢ ∀ ~x ϕ ( ~x ) or RANDOM( R ) ⊢ ∀ ~x ¬ ϕ ( ~x ) .he importance of RANDOM( R ) comes from its role as the asymptotic limit of the class ofall R -structures. In fact, it axiomatises the limit theory of R -structures even when the individualprobabilities of relational atoms are given by q R rather than : Fact 18. lim n →∞ Q ( n ) T ( ϕ ) = 1 for all abstract distributions T over R and all extension axioms over R . The main theorem of [1] shows that

RANDOM( R ) not only eliminates classical quantiﬁers, butalso least ﬁxed point quantiﬁers: Fact 19.

Let ϕ ( ~x ) be an LFP formula over R . Then there is a ﬁnite subset G of RANDOM( R ) and a quantiﬁer-free formula ϕ ′ ( ~x ) such that G ⊢ ∀ ~x ϕ ( ~x ) ↔ ϕ ′ ( ~x ) . Putting this together, we can derive the following:

Theorem 20.

Let T be an abstract LFP distribution over a vocabulary R . Then T is asymptoticallyequivalent to a quantiﬁer-free FOL distribution over R .Proof. By Fact 19 and the ﬁniteness of the vocabulary V , there is a ﬁnite set G of extensions axiomsover R such that there are quantiﬁer-free R -formulas φ ′ R for every R ∈ V\R with G ⊢ ∀ ~x φ R ( ~x ) ↔ φ ′ R ( ~x ) .By Fact 18, lim n →∞ Q ( n ) T ( { ω ∈ Ω n | ω R | = G } ) = 1 for any ﬁnite subset G ⊆ RANDOM( R ) andthus lim n →∞ Q ( n ) T ( { ω ∈ Ω n |∀ R ∈V\R ω R | = φ R ↔ φ ′ R } ) = 1 . Let ( Q ( n ) T ) be the family of distributionsinduced by the quantiﬁer-free FO -distribution over R , in which every φ R is replaced by φ ′ R . Byconstruction, Q ( n ) ( ω ) = Q ′ ( n ) ( ω ) for every world ω with ∀ R ∈V\R ω R | = φ R ↔ φ ′ R . Therefore, sup A ⊆ Ω n | Q ( n ) ( A ) − Q ′ ( n ) ( A ) | is bounded by above by − Q ( n ) ( { ω ∈ Ω n |∀ R ∈V\R ω R | = φ R ↔ φ ′ R } ) ,which limits to 0 since lim n →∞ Q ( n ) ( { ω ∈ Ω n |∀ R ∈V\R ω R | = φ R ↔ φ ′ R } ) = 1 . In this section we will see that the distribution semantics at the heart of probabilistic logic pro-gramming can be seen as a special case of abstract distributions. We can then apply the conceptsdeveloped above to infer a description of their asymptotic behaviour as well as a complete syntacticcharacterisation of projectivity.Lastly, we will brieﬂy sketch how to transfer the results of this section to other formalisms suchas the relational Bayesian network speciﬁcations from [5] and a subclass of the relational Bayesiannetwork speciﬁcations from [11].

In relating probabilistic logic programming to the abstract distribution semantics that we haveintroduced above, we will employ the simpliﬁcation in [20] and consider a probabilistic logic programto be a stratiﬁed Datalog program over probabilistic facts. This distribution semantics covers severaldiﬀerent equally expressive formalisms. See [20,19] for an overview.For an introduction to the syntax and semantics of stratiﬁed Datalog Programs in line with thispaper, see Chapter 9 of [7].e will use the notation ( Π, P ) ~t for an intensional symbol P of a stratiﬁed logic program Π tomean that “The program Π proves P~t ”. Deﬁnition 21. A probabilistic logic program consists of probabilistic facts and deterministic rules,where the deterministic part is a stratiﬁed Datalog program. We will consider it in our frameworkof abstract distribution semantics as follows: R is given by relation symbols R ′ for every probabilistic fact p R :: R ( ~x ) , with q R ′ := p R . Theirarity is just the arity of R . S is given by the vocabulary of the probabilistic logic program and additionally the R ′ in R .Let Π be the stratiﬁed Datalog program obtained by preﬁxing the program { R ′ ( ~x ) ← R ( ~x ) | R ′ ∈R} to the deterministic rules of the probbailistic logic program.Then φ P for a P ∈ S\R is given by ( Π, P ) ~t. The distribution semantics for probabilistic programming is related to the LFP distributionsemantics introduced above through the following fact, cf. Theorem 9.1.1 of [7]:

Fact 22.

For every stratiﬁable Datalog formula ( Π, P ) ~t as above, there exists an LFP formula ϕ ( ~t ) over the extensional vocabulary R of Π such that for every R -structure A and every interpretationof variables on A , A | = ϕ ( ~t ) if and only if A | = ( Π, P ) ~t .Remark. In fact, it suﬃces to consider formulas in the so-called bounded ﬁxpoint logic, whoseexpressiveness lies between ﬁrst order logic and least ﬁxed point logic; see [7] for details.This translation allows us to apply the asymptotic quantiﬁer elimination results from Section 2to probabilistic logic programming.In order to obtain a characterisation within probabilistic logic programming, however, we needto translate quantiﬁer-free ﬁrst order formulas back to stratiﬁable Datalog.In fact, they can be mapped to a subset of stratiﬁed Datalog that is well-known from logicprogramming:

Deﬁnition 23.

A Datalog program, Datalog formula or probabilistic logic program is called range-restricted if every variable occurring in the body of a clause also occurs in the head of that clause.This property corresponds exactly to the fragment of probabilistic logic programs shown to beprojective by Jaeger and Schulte [12] (Proposition 4.3 there):

Proposition 24.

Every range-restricted probabilistic logic program is projective.

The proof of Theorem 9.1.1 of [7] gives the following:

Fact 25.

Every quantiﬁer-free ﬁrst order formula is equivalent to a range-restricted stratiﬁed Dat-alog formula .

Now we have all the ingredients to formulate the main result of this subsection.

Theorem 26.

Every probabilistic logic program is asymptotically equivalent to a range-restrictedprobabilistic logic program.Proof.

Let

S\R be the extensional vocabulary of the probabilistic logic program Θ and let Π beits underlying Datalog program. Then for every relation R ∈ S\R , R ( ~t ) is given by the Datalogformula ( Π, R ) ~t over any given R -structure. By Fact 22, ( Π, R ) ~t is equivalent to an LFP formula φ R ver R . Let T be the abstract LFP distribution over R in which for every R ∈ R q R is taken from Θ and for every R ∈ S\R , this φ R is used. Then T and Θ induce equivalent families of distributions.By Theorem 20, T is asymptotically equivalent to a quantiﬁer-free abstract distribution, which inturn is equivalent to a range-restricted Datalog probabilistic logic program by Fact 25. Therefore Θ itself is asymptotically equivalent to a range-restricted probabilistic logic program. Corollary 27.

A probabilistic logic program is projective if and only if it is equivalent to a range-restricted probabilistic logic program.Proof.

By Theorem 26 every (projective) probabilistic logic program is asymptotically equivalentto a range-restricted probabilistic logic program (which is itself projective by Proposition 24). ByProposition 4, they are therefore actually equivalent.

While probabilistic logic programs are just one of several formalisms of statistical relational AI, wecan relate our framework to those based on knowledge based model construction. This relation iswell-known in the literature (see e. g. [20,19]), and so we will not go into details.First, however, we will clarify how Cozman and Maua’s [5] relational Bayesian network speciﬁ-cations correspond to abstract FOL distributions. Recall from there that such a relational Bayesiannetwork speciﬁcation diﬀers from an abstract FOL distribution in that the φ R are allowed to men-tion any symbols from S rather than just symbols from R . However, it is then stipulated that thedependency graph induced by the φ R must be acyclic. In that case, every relation R has a well-deﬁned rank n ∈ N and only refers to relations of lower rank than them. Relations in R have rank . Then we can iteratively unfold the relations in φ R , where R is of rank n + 1 , by replacing anyoccurence of R ′ of rank < rank( R ′ ) < n by their deﬁnition in terms of relations in R . Thus, any φ R can be equivalently expressed using only symbols from R .As the name suggests, relational Bayesian network speciﬁcations are well-suited for expressingBayesian networks. While the translation is most straightforward for ground Bayesian networks, onecan also express relational Bayesian networks (sensu [11]), as long as their lifted dependency graphis acyclic and they only use the noisy-or combination function. This latter constraint is due to thecorrespondence of the noisy-or combintion function to existential quantiﬁcation over independentprobabilistic facts; see [19] for a discussion. As probailistic dependencies of higher rank need to beencoded by new probabilistic facts (much like the interpretation of probabilistic rules in Problog),we generally need to expand the language. Overall, we obtain: Proposition 28.

Let T be a relational Bayesian network on vocabulary S with an acyclic depen-dency graph which only uses the noisy-or combination function. Then there is an S ′ ⊇ S and anabstract existential FOL distribution T ′ over an R ⊆ S ′ with vocabulary S ′ such that the reduct of T ′ to S is equivalent to T . Since asymptotic equivalence is preserved under reduct (Proposition 6), we can apply Theo-rem 20 to such relational Bayesian networks and conclude that they are asymptotically equivalentto (reducts of) quantiﬁer-free FOL distributions. In order to complete the characterisation, wenote that such quantiﬁer-free distributions correspond to Bayesian networks without combinationfunctions. Therefore, we obtain: roposition 29.

Let T be a relational Bayesian network on vocabulary S with an acyclic depen-dency graph which only uses the noisy-or combination function. Then T is asymptotically eqivalentto a relational Bayesian network with acyclic dependency graph which does not use any combinationfunction. Since the relational Bayesian networks without combination functions are exactly those seen tobe projective in [12], the characterisation of projective distributions also carries over to this setting:

Proposition 30.

A relational Bayesian network with an acyclic dependency graph using only thenoisy-or combination function is projective if and only if it is equivalent to a relational Bayesiannetwork which does not use any combination function.

Note also that while probabilistic logic programs (which correspond to abstract (bounded) LFPdistributions) are generally more expressive than relational Bayesian networks with acyclic depen-dency graphs and the noisy-or combination function (which can be characterised by abstract FOLdistributions), this gain in expressiveness disappears when considering only the projective fragmentof each.

The results have immediate consequences for the expressiveness of probabilistic logic programmingand the other formalisms described above.We discuss two particularly striking ones here:

Asymptotic loss of information

Very insightful is the case of a probabilistic rule , i.e. a clausalformula annotated with a probability. Because of its intuitive appeal, this is a widely used syntacticelement of probabilistic logic programming languages such as Problog, and its semantics is deﬁnedby introducing a new probabilistic fact to model the uncertainty of the rule. More precisely: p :: R ( ~x ) : − Q ( ~x , ~y ) , . . . , Q n ( ~x n , ~y n ) (where ~x are the variables appearing in R , ~x i ⊆ ~x )is interpreted as p :: I ( ~x, ~y ); R ( ~x ) : − Q ( ~x , ~y ) , . . . , Q n ( ~x n , ~y n ) , I ( ~x, ~y ) (where ~y := S ~y i ).It is now easy to see from Proposition 17 that in the asymptotic quantiﬁer-free representationof this probabilistic rule, I will no longer occur, since it originally occured implicitly quantiﬁed inthe body of the clause. However, I was the only connection between the probability annotationof the rule and its semantics! Therefore, the asymptotic probability of R ( ~x ) is independent of theprobability assigned to any non-range-restricted rule with R ( ~x ) as its head.A similar argument holds in noisy-or Bayesian networks whenever the noisy-or is invoked. Expressing projective families of distributions

Our results also show how few of the projective fami-lies of distributions can be expressed in those formalisms. This conﬁrms the suspicion voiced in [13]that despite the ostensible similarities between languages such as Independent Choice Logic, whichare based on the distribution semantics, and the array representation of [13], a direct applicationof techniques from probabilistic logic programming to general projective families of distributionsmight prove challenging.We show that already in the very limited signature consisting of a single unary relation symbol R ,there is no probabilistic logic program that induces the distribution that is uniform on isomorphismclasses of structures: eﬁnition 31. Let S := { R } consist of one unary predicate, and let m ∗ be the family of distribu-tions on S -structures deﬁned by m ∗ ( { ω } ) := | D |∗ N ω for a world ω ∈ Ω D , where N ω := |{ ω ′ ∈ Ω D | ω ∼ = ω ′ }| .This gives each isomorphism type of structures equal weight, and then within each isomorphismtype every world is given equal weight too. m ∗ is an important probability measure for two reasons; it plays a special role in ﬁnite modeltheory since the so-called unlabelled 0-1 laws are introduced with respect to this measure. Further-more, it was introduced explicitly by Carnap [3,4] as a candidate measure for formalising inductivereasoning, as part of the so-called continuum of inductive methods (see also [17] for a modernexposition).It is easily seen to be exchangeable; it is also projective, and in fact an elementary calculationshows that for any domain D and any { a , . . . a n +1 } ⊆ D , m ∗ (cid:16) R ( a n +1 ) | { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I (cid:17) = | I | + 1 n + 1 (4.1)(see any of the sources above for a derivation). Proposition 32.

Let S ′ be a ﬁnite vocabulary extending S from Deﬁnition 31. Then there isno probabilistic logic program with vocabulary S ′ such that the reduct of the induced family ofdistributions to S is equal to m ∗ .Proof. For the sake of simplicity, we will assume that S ′ has no constant symbols. Since m ∗ is pro-jective, it would have to be induced by a range-restricted probabilistic logic program. In particular,in any grounding, any probabilistic fact appearing both in the body of a clause with head R ( a n +1 ) and in the body of a clause with head R ( a i ) for an i < n + 1 must be nullary. Since there are onlyﬁnitely many nullary predicates in S ′ , there are only ﬁnitely many possible conﬁgurations of thosenullary predicates. For every such conﬁguration ϕ , let q ϕ be the conditional probability of R ( x ) given ϕ (This is well-deﬁned since there are no constants in the language). We observe from Equation4.1 that for variable n ., the inﬁmum of m ∗ (cid:16) R ( a n +1 ) | { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I (cid:17) is0, even if we assume that there is at least one i with R ( a i ) . In that situation, the conditional prob-ability of any ϕ with q ϕ = 0 given { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I is 0. However, sincethere are only ﬁnitely many conﬁgurations of nullary predicates ϕ , the inﬁmum c of the nonzero q ϕ is greater than 0. Since the nullary predicate symbols are the only facts appearing in the bodies ofclauses with heads R ( a n +1 ) and clauses with heads R ( a i ) for an i < n + 1 , R ( a n +1 ) is conditionallyindependent of { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I given those nullary predicates. Thus, astandard calculation reveals that the conditional probability of R ( a n +1 ) given { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪{¬ R ( a i ) } i ∈{ ,...,n }\ I is a weighted mean of the non-zero q ϕ and therefore bounded below by c > , incontradiction to 0 being the inﬁmum of m ∗ (cid:16) R ( a n +1 ) | { R ( a i ) } i ∈ I ⊆{ ,...,n } ∪ {¬ R ( a i ) } i ∈{ ,...,n }\ I (cid:17) . Remark.

In fact, the proposition easily generalises to all non-extreme probability functions on thecontinuum of inductive methods.Furthermore, allowing a ﬁnite number of constant symbols does not aﬀect the argument of theproof. .2 Complexity results

A natural question would be the complexity of determining an asymptotically equivalent range-restricted program for any given probabilistic logic program. Since this operation takes a non-groundprobabilistic logic program as input and computes another probabilistic logic program, the notionof data complexity does not make sense in this context. Instead, program complexity would be anappropriate measure.In our context, the input program could be measured in diﬀerent ways. Since our analysis is basedon the setting of abstract distributions, we will be considering as our input abstract distributionsobtained from (stratiﬁed) probabilistic logic programs . We will furthermore ﬁx our signatures R and S . Since the transformation acts on each φ R in turn and independently, it suﬃces to considerthe individual φ R as input. It is natural to ask about complexity in the length of φ R .In fact, one can extract upper and lower bounds from [1] (building on [9]), whose asymptoticresults form the core of our present work. The task of determining whether the probability of aﬁrst-order sentence converges to 0 or 1 with increasing domain size, which is a special case of ourtransformation, is complete in PSPACE (Theorem 1.4 of [1]). Therefore the program transformationis certainly PSPACE-hard. On the other hand, asymptotic elimination of quantiﬁers in least ﬁxedpoint logic is complete in EXPTIME (Theorems 4.1, 4.3 of [1]), so the program transformation iscertainly in EXPTIME.In order to specify further, we note that for abstract FO distributions, which correspond toacyclic probabilistic logic programs, the transformation can be performed in PSPACE:Let R be of arity n . Then enumerate the (ﬁnitely many) quantﬁer-free n -types ( ϕ i ) in R . Nowfor any φ R of arity n we can check succesively in polynomial space in the length of φ R , whether theprobability of ϕ i → φ R converges to 0 or 1. Then φ R is asymptotically equivalent to the conjunctionof those quantiﬁer-free n -types for which 1 is returned.In the general case of least ﬁxed point logic, Blass et al. [1] show that the problem of ﬁndingan asymptotically equivalent ﬁrst-order sentence is EXPTIME complete. However, to representstratiﬁed Datalog, only the fragment known as bounded or stratiﬁed least ﬁxed point logic is required(see Sections 8.7 and 9.1 of [7]). Therefore, the complexity class of the program transformation ofstratiﬁed probabilistic logic programs corresponds to the complexity of the asymptotic theory of bounded ﬁxed point logic, which to the best of our knowledge is still open. The analysis presented here suggests several strands of further research.While some widely used directed frameworks can be subsumed under the probabilistic logicprogramming paradigm as discussed in Subsection 3.2, undirected models such as Markov logicnetworks (MLNs) seem to require a diﬀerent approach. Indeed, the projective fragment of MLNsisolated by [12] is particularly restrictive, since it only allows formulas in which every literal has thesame variables ( the σ -determinate formulas of [6]; cf. also the parametric classes of ﬁnite modeltheory, for instance in Section 4.2 of [7]). It might therefore be expected that if an analogous resultto Theorem 26 holds for MLNs, they could express even fewer projective families of distributionsthan probabilistic logic programs.Beyond the FOL or LFP expressions used in current probabilistic logic programming, anotherdirection is to explore languages with more expressive power. Candidates for this are for instance thelogic with probablity quantiﬁers from [14] or the conditional probability logic from [16]. Appropriatesymptotic quantiﬁer elimination results have been shown in both settings [16,15], but their impactfor use as a representation language has yet to be evaluated.Furthermore, as mentioned in [19], a diﬀerent logical framework (such as that described in [10])or an incorporation of second-order elements (such as described in [2]) could enable the expression ofmore varied combination functions. Therefore, investigating the asymptotic theory of such extendedlogics could have direct consequences for the study of the expressiveness of a broader class ofknowledge-based model construction frameworks.Finally, the failure of the classical paradigms under investigation to express general projectivefamilies of distributions suggests one must look beyond the current methods and statistical relationalframeworks to address the challenge of learning and inference for general projective families ofdistributions issued by [13]. References

1. Blass, A., Gurevich, Y., Kozen, D.: A zero-one law for logic with a ﬁxed-point operator. Inf. Control. (1-3), 70–90 (1985). https://doi.org/10.1016/S0019-9958(85)80027-92. Bry, F.: In praise of impredicativity: A contribution to the formalization of meta-programming. TheoryPract. Log. Program. (1), 99–146 (2020)3. Carnap, R.: Logical Foundations of Probability. University of Chicago Press (1950)4. Carnap, R.: The Continuum of Inductive Methods. University of Chicago Press (1952)5. Cozman, F.G., Mauá, D.D.: The ﬁnite model theory of bayesian network speciﬁcations: De-scriptive complexity and zero/one laws. Int. J. Approx. Reason. , 107–126 (2019).https://doi.org/10.1016/j.ijar.2019.04.003. URL https://doi.org/10.1016/j.ijar.2019.04.003

6. Domingos, P.M., Singla, P.: Markov logic in inﬁnite domains. In: L.D. Raedt, T.G. Dietterich,L. Getoor, K. Kersting, S. Muggleton (eds.) Probabilistic, Logical and Relational Learning - AFurther Synthesis, 15.04. - 20.04.2007,

Dagstuhl Seminar Proceedings , vol. 07161. InternationalesBegegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss Dagstuhl, Germany (2007). URL http://drops.dagstuhl.de/opus/volltexte/2008/1381

7. Ebbinghaus, H., Flum, J.: Finite model theory, Second Edition. Springer Monographs in Mathematics.Springer (2006)8. Fitting, M.: Fixpoint semantics for logic programming a survey. Theor. Comput.Sci. (1-2), 25–51 (2002). https://doi.org/10.1016/S0304-3975(00)00330-3. URL https://doi.org/10.1016/S0304-3975(00)00330-3

9. Grandjean, E.: Complexity of the ﬁrst-order theory of almost all ﬁnite structures. Inf.Control. (2/3), 180–204 (1983). https://doi.org/10.1016/S0019-9958(83)80043-6. URL https://doi.org/10.1016/S0019-9958(83)80043-6

10. Hommersom, A., Lucas, P.J.F.: Generalising the interaction rules in probabilistic logic. In:T. Walsh (ed.) IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artiﬁ-cial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pp. 912–917. IJCAI/AAAI (2011).https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-15811. Jaeger, M.: Relational bayesian networks. In: D. Geiger, P.P. Shenoy (eds.) UAI ’97: Proceed-ings of the Thirteenth Conference on Uncertainty in Artiﬁcial Intelligence, Brown University,Providence, Rhode Island, USA, August 1-3, 1997, pp. 266–273. Morgan Kaufmann (1997). URL https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=320&proceeding_id=13

12. Jaeger, M., Schulte, O.: Inference, learning, and population size: Projectivity for SRL models. CoRR abs/1807.00564 (2018)13. Jaeger, M., Schulte, O.: A complete characterization of projectivity for statistical relational models.In: C. Bessiere (ed.) Proceedings of the Twenty-Ninth International Joint Conference on ArtiﬁcialIntelligence, IJCAI 2020, pp. 4283–4290. ijcai.org (2020). https://doi.org/10.24963/ijcai.2020/5914. Keisler, H.J.: Probability quantiﬁers. In: Model-theoretic logics, Perspect. Math. Logic, pp. 509–556.Springer, New York (1985)15. Keisler, H.J., Lotfallah, W.B.: Almost everywhere elimination of probability quantiﬁers. J. Symb. Log. (4), 1121–1142 (2009). https://doi.org/10.2178/jsl/125474868316. Koponen, V.: Conditional probability logic, lifted bayesian networks, and almost sure quantiﬁer elim-ination. Theor. Comput. Sci. , 1–27 (2020). https://doi.org/10.1016/j.tcs.2020.08.006. URL https://doi.org/10.1016/j.tcs.2020.08.006

17. Paris, J., Vencovská, A.: Pure inductive logic. Perspectives in Logic. Associa-tion for Symbolic Logic, Ithaca, NY; Cambridge University Press, Cambridge (2015).https://doi.org/10.1017/CBO9781107326194. URL https://doi.org/10.1017/CBO9781107326194

18. Poole, D., Buchman, D., Kazemi, S.M., Kersting, K., Natarajan, S.: Population size extrap-olation in relational probabilistic modelling. In: U. Straccia, A. Calì (eds.) Scalable Uncer-tainty Management - 8th International Conference, SUM 2014, Oxford, UK, September 15-17,2014. Proceedings,

Lecture Notes in Computer Science , vol. 8720, pp. 292–305. Springer (2014).https://doi.org/10.1007/978-3-319-11508-5_2519. Raedt, L.D., Kimmig, A.: Probabilistic (logic) programming concepts. Mach. Learn.100