[PDF] Markov properties for mixed graphs

Abstract

In this paper, we unify the Markov theory of a variety of different types of graphs used in graphical Markov models by introducing the class of loopless mixed graphs, and show that all independence models induced by m -separation on such graphs are compositional graphoids. We focus in particular on the subclass of ribbonless graphs which as special cases include undirected graphs, bidirected graphs, and directed acyclic graphs, as well as ancestral graphs and summary graphs. We define maximality of such graphs as well as a pairwise and a global Markov property. We prove that the global and pairwise Markov properties of a maximal ribbonless graph are equivalent for any independence model that is a compositional graphoid.

Full PDF

aa r X i v : . [ s t a t . O T ] M a r Bernoulli (2), 2014, 676–696DOI: 10.3150/12-BEJ502 Markov properties for mixed graphs

KAYVAN SADEGHI and STEFFEN LAURITZEN Department of Statistics, Baker Hall, Carnegie Mellon University, Pittsburgh, PA 15213, USA.E-mail: [email protected] Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UnitedKingdom. E-mail: steﬀ[email protected]

In this paper, we unify the Markov theory of a variety of diﬀerent types of graphs used ingraphical Markov models by introducing the class of loopless mixed graphs, and show that allindependence models induced by m -separation on such graphs are compositional graphoids. Wefocus in particular on the subclass of ribbonless graphs which as special cases include undirectedgraphs, bidirected graphs, and directed acyclic graphs, as well as ancestral graphs and summarygraphs. We deﬁne maximality of such graphs as well as a pairwise and a global Markov property.We prove that the global and pairwise Markov properties of a maximal ribbonless graph areequivalent for any independence model that is a compositional graphoid. Keywords: composition property; global Markov property; graphoid; independence model; m -separation; maximality; pairwise Markov property

1. Introduction

Graphical Markov models have become widely used in recent years. The models usegraphs to represent conditional independence relations for systems of random variables,with nodes of the graph corresponding to random variables and edges representing depen-dencies. Several classes of graphs with various independence interpretations have beendescribed in the literature. These range from undirected graphs with simple separation forderivation of independencies [19] to various forms of mixed graphs [18, 24, 30], includingchain graphs with several diﬀerent separation criteria [2, 5, 8, 10, 17].In spite of the diﬀerences among these graphs, their structural similarities motivatean attempt to unify them. For this purpose, we introduce the class of loopless mixedgraphs and let them entail independence models using the same separation criterion, m -separation. This uniﬁcation covers many graphical independence models in the lit-erature with some independence models for chain graphs forming a notable exception;see Section 4 for further details. We show that any independence model generated by This is an electronic reprint of the original article published by the ISI/BS in

Bernoulli ,2014, Vol. 20, No. 2, 676–696. This reprint diﬀers from the original in pagination andtypographic detail. (cid:13)

K. Sadeghi and S. Lauritzen m -separation in a loopless mixed graph is a compositional graphoid. This ensures thatcertain intuitive methods of reasoning are indeed valid for such graphs, as they in somesense behave as ordinary undirected graphs.A common motivation for deﬁning MC-graphs [18], summary graphs [30], and ancestralgraphs [24], is to represent independence relations implied by marginalisation over andconditioning on sets of variables satisfying the Markov property of a directed acyclicgraph (DAG). The focus of our study is on a subclass of loopless mixed graphs whichwe shall term ribbonless . The class of ribbonless graphs is suﬃciently rich to serve thesame purpose: these graphs are obtained by a simple modiﬁcation of MC graphs derivedfrom a DAG after marginalisation and conditioning; and it contains summary graphs andancestral graphs as special cases.For ribbonless graphs, we deﬁne global and pairwise Markov properties, the latterbeing associated with interpreting missing edges in the graph as representing conditionalindependencies. We prove as our main result that a compositional graphoid independencemodel over a maximal ribbonless graph satisﬁes the global Markov property if and onlyif it satisﬁes the pairwise Markov property. This ensures that the independence modelsrepresented by such graphs are generated by their missing edges, which again supportsthe direct visual intuition. The concepts of pairwise and global Markov properties for undirected graphs were in-troduced in [13] in the context of random ﬁelds and shown to be equivalent for positivedensities. Alternative proofs were later given independently by several authors, for ex-ample [3, 12]; see also [4]. An abstract variant of this theorem was proven in [21] forindependence models satisfying graphoid axioms as these are satisﬁed by probabilisticdistributions with positive densities; see also [29] and [11]. Independence models for undi-rected graphs were discussed comprehensively in Chapter 3 of [19].A global Markov property that uses the m -separation criterion and a pairwise Markovproperty were deﬁned in [24] for maximal ancestral graphs without considering conditionsunder which they are equivalent. We use a generalisation of these Markov propertiesfor maximal ribbonless graphs, which contains maximal ancestral graphs as a subclass,and prove their equivalence for compositional graphoids. This has been mentioned as aconjecture in [14]. In the next section, we introduce the basic concepts of graph theory, general and proba-bilistic independence models, and compositional graphoids.In Section 3, we introduce the class of loopless mixed graphs and additional graphtheoretical deﬁnitions special to mixed graphs. We also associate the m -separation crite-rion to this class, and prove for any loopless mixed graph that the independence modelinduced by m -separation is a compositional graphoid. arkov properties for mixed graphs

2. Basic deﬁnitions and concepts

In this section, we introduce basic deﬁnitions and notation for independence models,graphs, and compositional graphoids. A graph G is a triple consisting of a node set or vertex set V , an edge set E , and arelation that with each edge associates two nodes (not necessarily distinct), called its endpoints . When nodes i and j are the endpoints of an edge, they are adjacent and wewrite i ∼ j . We say the edge is between its two endpoints. We usually refer to a graph asan ordered pair G = ( V, E ). Graphs G = ( V , E ) and G = ( V , E ) are called equal if( V , E ) = ( V , E ). In this case we write G = G .Notice that our graphs are labeled , that is, every node is considered as a diﬀerentobject. Hence, for example, graph i j k is not equal to j i k .A loop is an edge with the same endpoints. Multiple edges are edges with the samepair of endpoints. A simple graph has neither loops nor multiple edges.A subgraph of a graph G is a graph G such that V ( G ) ⊆ V ( G ) and E ( G ) ⊆ E ( G )and the assignment of endpoints to edges in G is the same as in G . An induced subgraph by nodes A ⊆ V is a subgraph that contains all and only nodes in A and all edges betweentwo nodes in A . A subgraph induced by edges B ⊆ E is a subgraph that contains all andonly edges in B and all nodes that are endpoints of edges in B .A walk is a list h v , e , v , . . . , e k , v k i of nodes and edges such that for 1 ≤ i ≤ k , theedge e i has endpoints v i − and v i . A path is a walk with no repeated node or edge. Ifthe graph is simple then the path can be uniquely determined by an ordered sequenceof node sets. Throughout this paper, we use node sequences to describe paths even ingraphs with multiple edges, as it usually is apparent from the context which of multipleedges belong to the path. We say a path is between the ﬁrst and the last nodes of thelist in G . We call the ﬁrst and the last nodes endpoints of the path and all other nodes inner nodes .If π = h i = i , i , . . . , i n , h i and π = h h, j m , j m − , . . . , j = j i are paths, their combina-tion π = π ◦ π is the path π = h i, . . . , i p − , k, j q − , . . . , j i , where k = i p = j q is the K. Sadeghi and S. Lauritzen ﬁrst node of π which is on both paths. If k = h then π is simply the concatenation ofthe two paths. In general, the concatenation of two paths will be a walk and not a pathas the paths may intersect in more than one point.A subpath of a path π is a path that can be considered a subgraph of π with theordering associated with π . A cycle in a graph G is a simple subgraph whose nodes canbe placed around a circle so that two nodes are adjacent if they appear consecutivelyalong the circle. An independence model J over a set V is a set of triples h X, Y | Z i (called indepen-dence statements ), where X , Y , and Z are disjoint subsets of V and Z can be empty,and h ∅ , Y | Z i and h X, ∅ | Z i always being included in J . The independence statement h X, Y | Z i is interpreted as “ X is independent of Y given Z ”.An independence model J over a set V is a semi-graphoid if for disjoint subsets A , B , C , and D of V , it satisﬁes the four following properties:1. h A, B | C i ∈ J if and only if h B, A | C i ∈ J ( symmetry );2. if h A, B ∪ D | C i ∈ J then h A, B | C i ∈ J and h A, D | C i ∈ J ( decomposition );3. if h A, B ∪ D | C i ∈ J then h A, B | C ∪ D i ∈ J and h A, D | C ∪ B i ∈ J ( weak union );4. h A, B | C ∪ D i ∈ J and h A, D | C i ∈ J if and only if h A, B ∪ D | C i ∈ J ( contraction ).A semi-graphoid for which the reverse implication of the weak union property holds issaid to be a graphoid , that is5. if h A, B | C ∪ D i ∈ J and h A, D | C ∪ B i ∈ J then h A, B ∪ D | C i ∈ J ( intersection ).Furthermore, a graphoid or semi-graphoid for which the reverse implication of the de-composition property holds is said to be compositional , that is6. if h A, B | C i ∈ J and h A, D | C i ∈ J then h A, B ∪ D | C i ∈ J ( composition ).Notice that simple separation in an undirected graph will trivially satisfy all of theseproperties, and hence compositional graphoids are direct generalisations of independencemodels given by separation in undirected graphs. The most common independence models are induced by probability distributions. Con-sider a set V and a collection of random variables ( X α ) α ∈ V with state spaces X α , α ∈ V and joint distribution P . We let X A = ( X v ) v ∈ A etc. for each subset A of V . For disjointsubsets A , B , and C of V we use the short notation A ⊥⊥ B | C to denote that X A is conditionally independent of X B given X C [7, 19], that is, that for any measurableΩ ⊆ X A and P -almost all x B and x C , P ( X A ∈ Ω | X B = x B , X C = x C ) = P ( X A ∈ Ω | X C = x C ) . arkov properties for mixed graphs J ( P ) by letting h A, B | C i ∈ J ( P ) if and only if A ⊥⊥ B | C w.r.t. P. We say that an independence model J is probabilistic if there is a distribution P suchthat J = J ( P ). We then also say that P is faithful to J .Probabilistic independence models are always semi-graphoids [21], whereas the con-verse is not necessarily true; see [29]. If P has strictly positive density, the inducedindependence model is also a graphoid; see, for example, Proposition 3.1 in [19]. If thedistribution P is a regular multivariate Gaussian distribution, J ( P ) is a compositionalgraphoid. This follows from the fact that for such a distribution A ⊥⊥ B | C ⇐⇒ k αβA ∪ B ∪ C = 0 for all α ∈ A, β ∈ B, where k αβA ∪ B ∪ C is the αβ entry in the concentration matrix of the distribution of X A ∪ B ∪ C and hence setwise conditional independence is directly determined by nodewise condi-tional independence.Probabilistic independence models with positive densities are not in general composi-tional graphoids; this only holds for special types of multivariate distributions such asthe Gaussian mentioned above and, say, the symmetric binary distributions used in [32].

3. Independence models for mixed graphs A mixed graph is a graph containing three types of edges denoted by arrows, arcs (bi-directed edges), and lines (full lines). Notice that we allow multiple edges of the sametype. A loopless mixed graph (LMG) is a mixed graph that does not contain any loops(a loop may be line, arrow, or arc). For an arrow j ≻ i , we say that the arrow is from j to i . We also call j a parent of i , i a child of j and we use the notation pa( i ) for theset of all parents of i in the graph. In the cases of i ≻ j or i ≺ ≻ j , we say that thereis an arrowhead at j or pointing to j .A path h i = i , i , . . . , i n = j i is direction-preserving from i to j if all i k i k +1 edges arearrows pointing from i k to i k +1 . If there is a direction-preserving path from j to i then j is an ancestor of i and i is a descendant of j . We denote the set of ancestors of i byan( i ). Notice that we do not include i in its set of anteriors or descendants.A tripath is a path with three nodes. Note that [26] used the term V-conﬁguration forsuch a path. However, here we follow [16] and most texts by letting a V-conﬁguration bea tripath with non-adjacent endpoints.In a mixed graph the inner node of three tripaths i ≻ t ≺ j , i ≺ ≻ t ≺ j , and i ≺ ≻ t ≺ ≻ j is a collider (or a collider node) and the inner node of any other tripathis a non-collider (or a non-collider node) on the tripath or more generally on any path ofwhich the tripath is a subpath. We shall also say that the tripath itself with inner collideror non-collider node is a collider or non-collider . We may speak of a collider or non-colliderwithout mentioning the relevant tripath or path when this is apparent from the context.Notice that a node may be a collider on one tripath and a non-collider on another. K. Sadeghi and S. Lauritzen

Two paths π and π (including tripaths or edges) between i and j are called endpoint-identical if there is an arrowhead pointing to i in π if and only if there is an arrowheadpointing to i in π and similarly for j . For example, the paths i ≻ j , i k ≺ ≻ j ,and i ≻ k ≺ l ≺ ≻ j are all endpoint-identical as they have an arrowhead pointingto j but no arrowhead pointing to i on the paths. The anterior graph of a loopless mixed graph G , denoted by G ∗ , is the graph obtainedfrom G by recursively removing arrowheads pointing to nodes that are the endpoints of aline, that is, by obtaining ◦ and ≺ ◦ from ≻ ◦ and ≺ ≻ ◦ respectively. Hence, it holds that G = G ∗ if and only if there are no arrowheads pointingto lines in G . Notice also that since removing an arrowhead pointing to a line does notaﬀect other arrowheads pointing to lines, it does not matter which arrowhead is removedﬁrst; therefore, the order of removing arrowheads pointing to lines does not aﬀect theﬁnal graph obtained.A path h i = i , i , . . . , i n = j i from i to j ( i = j ) in G ∗ is an anterior path if it has theform i i · · · i m ≻ i m +1 ≻ · · · ≻ j . Notice that this path may onlycontain lines or arrows. We shall say that i is anterior of j in G if there is an anteriorpath from i to j in G ∗ . Notice that although the anterior path is deﬁned in G ∗ we mayfrom time to time refer to an anterior path in G as the path corresponding to the anteriorpath in G ∗ .We use the notation ant( i ) for the set of all anteriors of i . Notice that, since ancestralgraphs have no arrowheads pointing to lines, we have G = G ∗ for an ancestral graph.Thus, our deﬁnition of anterior extends the notion of anterior used in [24] for ancestralgraphs with the minor diﬀerence that we do not include a node in its anterior set. How-ever, it is diﬀerent from and inconsistent with the deﬁnition of anteriors in [10] and [1].For example, in the graph G in Figure 1(a), ant( i ) = { l, h, j, p } and ant( p ) = { l, h, j } .This can be seen by looking at the anterior paths h p, j, h, l, i i from p to i and h l, h, j, p i from l to p (as well as from p to l ) in Figure 1(b).We ﬁrst show that transitivity holds for anteriors. Lemma 1.

For any loopless mixed graph it holds that if i ∈ ant( j ) and j ∈ ant( k ) then i ∈ ant( k ) . Figure 1. (a) A mixed graph G . (b) The anterior graph G ∗ of G . arkov properties for mixed graphs Proof. If i ∈ ant( j ) and j ∈ ant( k ), G ∗ has anterior paths π from i to j and π from j to k . As no arrowhead meets a line in G ∗ their combination π ◦ π is an anterior pathfrom i to j in G ∗ . (cid:3) Here we also introduce a lemma that is used in several proofs of this paper.

Lemma 2.

Let G be a loopless mixed graph. If i ∈ ant( j ) \ an( j ) , then either i or adescendant of i is the endpoint of a line in G . Proof.

The proof uses induction on the number of arrowheads removed from G to obtain G ∗ . For the base, if G = G ∗ it follows immediately from the deﬁnition of an anterior paththat i must be the endpoint of a line or we would have i ∈ an( j ).Next, suppose that G ∗ is obtained from G by removing n + 1 arrowheads and let ˜ G be obtained from G by removing a single arrowhead pointing to a line from G . Then G ∗ is also the anterior graph of ˜ G , but with only n arrowheads needing removal. Thus, if i ∈ ant( j ) in G , it is also anterior to j in ˜ G . Consider now two cases: Case I.

Assume i is an ancestor of j in ˜ G . Since i is not an ancestor of j in G , ˜ G musthave been obtained by turning an arc into an arrow. Say this arrowhead points to h .Then h is an endpoint of a line and it is a descendant of i in G . Case II. If i is not an ancestor of j in ˜ G , the inductive hypothesis yields that i is eitheradjacent to a line ih in ˜ G or has a descendant h in ˜ G which is the endpoint of a line in˜ G . Let h be the node adjacent to a line in ˜ G . If the arrowhead removed is not on thedirection-preserving path π from i to h the conclusion obviously follows. Else, there mustbe node k on π which is adjacent to a line in G and can be used instead of h . (cid:3) m -separation criterion Here we deﬁne a separation criterion for LMGs. We use this criterion to induce indepen-dencies on LMGs and its subclasses deﬁned in Section 3.We ﬁrst deﬁne an m -connecting path: Let C be a subset of the node set of an LMG.A path is m -connecting given C if all its collider nodes are in C ∪ an( C ) and all its non-collider nodes are outside C . For two disjoint subsets of the node set A and B , we saythat C m -separates A and B if there is no m -connecting path between A and B given C . In this case, we use the notation A ⊥ m B | C . Notice that the m -separation criterioninduces an independence model J m ( G ) on G by A ⊥ m B | C ⇐⇒ h A, B | C i ∈ J m ( G ).We note that m -separation is unaﬀected if we replace multiple edges of the same typewith a single edge of that type. The m -separation criterion for LMGs is the same asthe separation criterion deﬁned in [24]. It is an extension of the d -separation criterionintroduced in [21]. Clearly, m -separation is also an extension of simple separation in anundirected graph, as then all edges are lines.For example, in graph G in Figure 2 it holds that h ∈ an( l ) and, thus, h i, h, j i isan m -connecting path given l . Therefore, h i, j | l i / ∈ J m ( G ). We now have the followingtheorem. A similar result for the induced independence model for MC graphs was givenin Proposition 2.10 of [18]. K. Sadeghi and S. Lauritzen

Figure 2.

A loopless mixed graph G for which h i, j | l i / ∈ J m ( G ). Theorem 1.

For any loopless mixed graph G , the independence model J m ( G ) is a com-positional graphoid. Proof.

For G = ( N, F ) and disjoint subsets A , B , C , and D of N , we prove that ⊥ m satisﬁes the six compositional graphoid axioms:(1) Symmetry : If A ⊥ m B | C , then B ⊥ m A | C : If there is no m -connecting pathbetween A and B given C , then there is no m -connecting path between B and A given C .(2) Decomposition : If A ⊥ m ( B ∪ D ) | C , then A ⊥ m D | C : If there is no m -connectingpath between A and B ∪ D given C , then there is no m -connecting path between A and D ⊆ ( B ∪ D ) given C .(3) Weak union : If A ⊥ m ( B ∪ D ) | C then A ⊥ m B | ( C ∪ D ): From (2) we know that A ⊥ m D | C and A ⊥ m B | C . Suppose, for contradiction, that there exist m -connectingpaths between A and B given C ∪ D . Consider a shortest path of this type and call it π . If there is no inner collider node on π , then there is an m -connecting path between A and B given C , a contradiction. On π all collider nodes are in ( C ∪ D ) ∪ an( C ∪ D ).If all collider nodes are in C ∪ an( C ), then there is an m -connecting path between A and B given C , again a contradiction. Hence, consider the closest collider node i ∈ ( D ∪ an( D )) \ ( C ∪ an( C )) to A on π . Now since the nodes between A and i are not in B ∪ D , there is an m -connecting path between A and i given C . If i ∈ D , then this isobviously a contradiction. Otherwise there is a node k ∈ D , for which i ∈ an( k ) and thusan m -connecting path between A and k given C , a contradiction again. Therefore, thereis no m -connecting path between A and B given C ∪ D .(4) Contraction : If A ⊥ m B | C and A ⊥ m D | ( B ∪ C ), then A ⊥ m ( B ∪ D ) | C : Sup-pose, for contradiction, that there exists an m -connecting path between A and B ∪ D given C . Consider a shortest path of this type and call it π . The path π is either be-tween A and B or between A and D . The path π being between A and B contradicts A ⊥ m B | C . Therefore, π is between A and D . In addition, since all inner collider nodeson π are in C ∪ an( C ) and because A ⊥ m D | ( B ∪ C ), an inner non-collider node shouldbe in B . This contradicts the fact that π is a shortest m -connecting path between A and B ∪ D given C .(5) Intersection : If A ⊥ m B | ( C ∪ D ) and A ⊥ m D | ( C ∪ B ), then A ⊥ m ( B ∪ D ) | C :Suppose, for contradiction, that there exists an m -connecting path between A and B ∪ D given C . Consider a shortest path of this type and call it π . The path π is either between A and B or between A and D . Because of symmetry between B and D in the formulationit is enough to suppose that π is between A and B . Since all inner collider nodes on π are in C ∪ an( C ) and because A ⊥ m B | ( C ∪ D ), an inner non-collider node should be arkov properties for mixed graphs D . This contradicts the fact that π is a shortest m -connecting path between A and B ∪ D given C .(6) Composition : If A ⊥ m B | C and A ⊥ m D | C , then A ⊥ m ( B ∪ D ) | C : Suppose,for contradiction, that there exist m -connecting paths between A and B ∪ D given C .Consider a path of this type and call it π . Path π is either between A and B or between A and D . Because of symmetry between B and D in the formula it is enough to supposethat π is between A and B . But this contradicts A ⊥ m B | C . (cid:3) Theorem 1 implies that we can focus on establishing conditional independence for pairsof nodes, formulated in the corollary below.

Corollary 1.

For a loopless mixed graph G and disjoint subsets of its node set A , B ,and C , it holds that A ⊥ m B | C if and only if i ⊥ m j | C for every nodes i ∈ A and j ∈ B . Proof.

The result follows from the fact that ⊥ m satisﬁes the decomposition and thecomposition properties. (cid:3)

4. Subclasses of loopless mixed graphs

LMGs and their associated independence models induced by m -separation unify a varietyof previously discussed graphical independence models. Important exceptions include certain independence models for chain graphs. Chaingraphs themselves are LMGs, but at least four diﬀerent Markov properties for chaingraphs have been discussed in the literature. Drton [8] has classiﬁed them into (i) the

LWF or block concentration Markov property, (ii) the

AMP or concentration regression Markov property, (iii) a Markov property that is dual to the AMP Markov property, and(iv) and the multivariate regression

Markov property. When the chain components con-sist entirely of arcs, the multivariate regression property is identical to the one inducedby m -separation. However, the independence model induced by m -separation in a chaingraph is typically diﬀerent from any of the other chain graph interpretations; see also[22, 25] and [20]. The class of MC graphs, deﬁned in [18], contains line loops and uses a diﬀerent separationcriterion for inducing an independence model. However, a small modiﬁcation of any MCgraph that is derived from a DAG after marginalisation and conditioning yields a so-called ribbonless graph, which is loopless and induces the same independence model asthe MC graph, but by m -separation [27]. Any ribbonless graph can be generated from a0 K. Sadeghi and S. Lauritzen

Figure 3. (a) A straight ribbon h h, i, j i with ne( i ) = ∅ . (b) The simplest cyclic ribbon h h, i, j i . DAG by marginalisation and conditioning and ribbonless graphs are stable under theseoperations [26]. The remaining part of this paper deals with such graphs. We ﬁrst give aformal deﬁnition of a ribbon.A ribbon is a collider tripath h h, i, j i such that both of the following two conditionshold:1. there is no endpoint-identical edge between h and j , that is, there is no hj -arc inthe case of h ≺ ≻ i ≺ ≻ j ; there is no hj -line in the case of h ≻ i ≺ j ; andthere is no arrow from h to j in the case of h ≻ i ≺ ≻ j ;2. i or a descendant of i is the endpoint of a line or is on a direction-preserving cycle.If i or a descendant of i is the endpoint of a line, then we say the ribbon is straight and if they are on a direction-preserving cycle we say the ribbon is cyclic . A ribbonlessgraph (RG) is an LMG that has no ribbons as induced subgraphs. Figure 3 illustrates astraight ribbon h h, i, j i and the simplest cyclic ribbon.Figure 4(a) illustrates a graph containing a straight ribbon h h, i, j i and Figure 4(b)illustrates a ribbonless graph. Notice that h h, i, j i is not a ribbon here since there is aline between h and j and this is an endpoint-identical edge. We proceed to establish thatribbonless graphs yield identical independence models to their anterior graphs and needthe following lemma. Lemma 3.

Let G be a ribbonless graph. If there is a collider tripath h i, j, k i in G that isnon-collider in G ∗ , then G has an ik -edge that is endpoint-identical to h i, j, k i . Proof.

Suppose that h G = G , G , . . . , G n = G ∗ i is a sequence of graphs, where eachgraph has been generated by removing one arrowhead pointing to a full line from theprevious graph starting from G . Figure 4. (a) A graph that is not ribbonless. (b) A ribbonless graph. arkov properties for mixed graphs G p +1 where h i, j, k i turns into a non-collidertripath. We prove by reverse induction that, for each 0 ≤ q ≤ p , h i, j, k i is a straightribbon unless there is an endpoint-identical ik -edge to h i, j, k i .In G p , the node j is obviously the endpoint of a line and the result holds. Thus, weassume that the result holds for G q . In G q − , it is easy to observe that if the line thatmakes the ribbon is an arrow pointing to another line or if an arrow on the direction-preserving cycle pointing to a line is an arc then j or a descendant of j is still the endpointof a line. Therefore, the result holds in G q − . Therefore, by reverse induction, this resultholds in G , and since G is ribbonless, in G there is an endpoint-identical ik -edge to h i, j, k i . (cid:3) For the graph G in Figure 3(a), the anterior graph G ∗ is the graph where all edgesbecome undirected. Clearly there is no endpoint-identical edge hj and the conclusion ofLemma 3 does not hold. This illustrates the role of a graph being ribbonless. Proposition 1.

For a ribbonless graph G , it holds that J m ( G ) = J m ( G ∗ ) , that is, G and G ∗ are Markov equivalent. Proof.

It is enough to prove that there is an m -connecting path between i and j given C in G if and only if there is an m -connecting path between i and j given C in G ∗ .Suppose that there is an m -connecting path between i and j given C in G . All non-colliders on the path in G are preserved in G ∗ . In addition, by Lemma 3, a collidertripath h i, j, k i becomes non-collider if there is an endpoint-identical ik -edge to h i, j, k i .In this case, the ik -edge can be used instead of h i, j, k i to establish an m -connecting pathin G ∗ .Conversely, suppose that there is an m -connecting path between i and j given C in G ∗ .Collider tripaths are collider tripaths in G , and if a non-collider tripath h i, j, k i has beencollider in G then, by Lemma 3, one can again use the ik -edge instead of h i, j, k i . Thusthe only thing that remains to be proven is that a direction-preserving path pointing toa member of C in G remains direction-preserving in G ∗ .In this case, by the same argument as in Lemma 3, if for the collider tripath h i, j, k i ,where j ∈ an( C ), the arrowhead of an arrow on the direction-preserving path in G istaken away then h i, j, k i is a ribbon unless there is an endpoint-identical ik -edge to h i, j, k i . Hence, we can use the ik -edge instead of h i, j, k i to establish an m -connectingpath. (cid:3) Thus, the absence of ribbons ensures that the Markov property is unchanged by formingthe anterior graph G ∗ . Again, as the anterior graph G ∗ of the graph G in Figure 3(a)is the graph with all edges becoming undirected, we have h ⊥ m j in G but not h ⊥ m j in G ∗ , illustrating that absence of ribbons is essential for the Markov equivalence of G and G ∗ .Independence models induced by m -separation in a ribbonless graph can be induced bymarginalisation over and conditioning on a DAG-independence model [26]. This impliesthat independence models corresponding to RGs are probabilistic, that is, any RG has afaithful probability distribution.2 K. Sadeghi and S. Lauritzen

Figure 5.

The hierarchy of subclasses of LMGs.

Other subclasses of LMGs that use m -separation and have been discussed in the literatureare summary graphs [30], ancestral graphs [24], acyclic directed mixed graphs [23, 28], undirected or concentration graphs [6, 19], bidirected or covariance graphs [5, 9, 15, 31],and the class of directed acyclic graphs [11, 16, 21]. In papers on summary graphs andregression chain graphs, dashed undirected edges (without arrowheads) have often beenused in place of bi-directed edges. Using the latter as we have done here makes the idea ofa collider more immediate so m -separation can be used directly and the relation betweenthe various types of graphs becomes transparent.The use of some of the above graphs are motivated by representing independencemodels obtained by marginalisation over and conditioning on subsets of the node set ofa DAG. For those graphs, arcs indicate marginalisation and lines indicate conditioning.The diagram in Figure 5 illustrates the hierarchy of subclasses of LMGs and theirassociated independence models generated by m -separation. For example, it can be seenfrom the diagram that bidirected graphs are also ancestral graphs, since they form asubclass of multivariate regression chain graphs, which again form a subclass of ancestralgraphs. Notice that the associated classes of independence models are all distinct exceptfor ancestral, summary, and ribbonless graphs, which are alternative representations ofthe same class of independence models. arkov properties for mixed graphs

5. Maximal ribbonless graphs

Among the independence models over the node set V of a graph G , those that are ofinterest to us conform with G , meaning that i ∼ j in G implies h i, j | C i / ∈ J for any C ⊆ V \ { i, j } . Henceforth, we assume that independence models J conform with G ,unless otherwise stated.For example, the independence model J = {h i, l | j i , h i, k | ∅ i} conforms with the graph G in Figure 6, whereas J = {h i, l | j i , h i, j | ∅ i} does not conform with G because of theindependence statement h i, j | ∅ i .A ribbonless graph G is called maximal if by adding any edge to G , the independencemodel induced by m -separation changes. Note that in [30] a graph that is maximal iscalled an independence graph .The independence models on RGs induced by m -separation conform with the graphs;hence for maximal graphs, adding an edge to the graph makes the independence modelsmaller. Therefore, we have the lemma below. Lemma 4.

A graph G = ( V, E ) is maximal if and only if for every pair of non-adjacentnodes i and j of V , there exists a subset C of V \ { i, j } such that i ⊥ m j | C . Proof.

The result follows directly from the deﬁnition of maximality. (cid:3)

RGs are not maximal in general. To see this consider the RG in Figure 7. There is no C such that i ⊥ m j | C . This is because if k ∈ C , the path i ≻ k ≺ ≻ j is m -connectinggiven C , and if k / ∈ C , i ≻ k ≻ j is m -connecting given C .To characterise maximal RGs, we need the following notion: A path h j, q , q , . . . , q p , i i is a primitive inducing path between i and j if and only if for every n , 1 ≤ n ≤ p ,(i) q n is a collider on the path; and(ii) q n ∈ an( { i } ∪ { j } ). Figure 6.

The independence model J = {h i, l | j i , h i, k | ∅ i} conforms with G whereas J = {h i, l | j i , h i, j | ∅ i} does not. Figure 7.

A non-maximal RG. K. Sadeghi and S. Lauritzen

This deﬁnition is a trivial extension of a primitive inducing path as deﬁned for ancestralgraphs in [24]. Note in particular that we consider any edge between i and j to be aprimitive inducing path. In Figure 7, h i, k, j i is a primitive inducing path.Next, we need the following lemmas. These also establish a pairwise Markov propertyfor maximal RGs. Lemma 5.

A non-collider node k on a path π between i and j in a ribbonless graph G iseither in ant( i ) ∪ ant( j ) or an anterior of a collider node h on π . Moreover, the relevantsubpath of π between k and i , j or h is an anterior path in G ∗ . Proof.

Let k = i m be a non-collider node on a path π = h i = i , i , . . . , i n = j i in G .Then from at least one side (say from i m − ) there is no arrowhead on π pointing to k .By moving towards i on the path as long as i p , 1 ≤ p ≤ m −

1, is non-collider on the path,we obtain that k ∈ ant( i p − ). This implies that if no i p is a collider then k ∈ ant( i ) andhence the lemma follows. (cid:3) Lemma 6.

For nodes i and j in an RG that are not connected by any primitive inducingpaths (and hence i j ), it holds that i ⊥ m j | (ant( i ) ∪ ant( j )) \ { i, j } . Proof.

Suppose, for contradiction, there is an m -connecting path between i and j given(ant( i ) ∪ ant( j )) \ { i, j } and denote a shortest such path by π . If there is a non-collidernode k on π then, by Lemma 5, k is either in ant( i ) ∪ ant( j ) or it is an anterior of acollider node on π . But since π is m -connecting given (ant( i ) ∪ ant( j )) \ { i, j } , collidernodes are in ant( i ) ∪ ant( j ) themselves. Hence, k ∈ ant( i ) ∪ ant( j ), which contradicts thefact that π is m -connecting. Therefore, all inner nodes of π must be colliders.Now we know that all inner nodes of π are in ant( i ) ∪ ant( j ) and i j . If, for acollider tripath h r, l, s i on π , l ∈ (ant( i ) ∪ ant( j )) \ (an( i ) ∪ an( j )) then, by Lemma 2 andsince the graph is ribbonless, there is an endpoint-identical rs -edge to the tripath, whichcontradicts π being shortest. Therefore, l ∈ an( i ) ∪ an( j ), which implies that π is primitiveinducing, again a contradiction. Therefore, there is no m -connecting path between i and j given (ant( i ) ∪ ant( j )) \ { i, j } , and hence i ⊥ m j | (ant( i ) ∪ ant( j )) \ { i, j } . (cid:3) Next, in Theorem 2 we give a necessary and suﬃcient condition for an RG to bemaximal. The analogous result for ancestral graphs was proved in Theorem 4.2 of [24].

Theorem 2.

A ribbonless graph G is maximal if and only if G does not contain anyprimitive inducing paths between non-adjacent nodes. Proof.

Let π = h i = i , i , . . . , i n = j i be a primitive inducing path between i and j in H , and let C be a subset V \ { i, j } , where V is the node set of H . We need to show thatthere is an m -connecting path between i and j given C .This is immediate if each internal node, that is, each of i , . . . , i n − , is in C ∪ an( C )by just using π , so assume that this is not the case. Thus there is an internal node of π not in C ∪ an( C ), and we may assume that there is one in an( i ). Pick such a node i q , arkov properties for mixed graphs ≤ q < n , as far along the path to j as possible. Consider a direction-preserving pathfrom i q to i , and let P denote the reverse of this path. Note that no internal node in P is in C ∪ an( C ). Let π be the part of π from i q to j . If each internal node in this path isin C ∪ an( C ) then we are done by taking the path P followed by π (note that no nodecan be repeated since each internal node in π is in C ∪ an( C ) and each internal nodein P is outside C ∪ an( C )). So suppose not. Let i p be the ﬁrst node in π that is notin C ∪ an( C ). Then i p / ∈ an( i ) (by the way i q was chosen), so i p ∈ an( j ). Let π be thepart of π from i q to i p , and let P be a direction-preserving path from i p to j . Note thatno internal node in P is in C ∪ an( C ). If P and P have no intersection, then much asabove we obtain an m -connecting path given C by taking P followed by π , followedby P . If P and P do intersect, then we obtain an m -connecting path as required byfollowing P up to the ﬁrst node on P and then following P .By letting C = (ant( i ) ∪ ant( j )) \ { i, j } for every non-adjacent nodes i and j , the otherdirection follows from Lemmas 4 and 6. (cid:3) For other special types of graphs that are subclasses of RGs, the condition for maxi-mality of RGs may get further simpliﬁed. Among the subclasses of RGs that have beenmentioned in this paper, summary graphs, ancestral graphs, and acyclic directed mixedgraphs are not necessarily maximal, while all others are maximal. This can be seen bychecking whether primitive inducing paths are permissible in each subclass.A Markov equivalent maximal graph can be generated from a non-maximal graphby adding endpoint-identical edges to a primitive inducing path between a pair of non-adjacent nodes. We refer the reader to [27] for details. The following lemma establishesthat anterior graphs of maximal graphs are themselves maximal.

Lemma 7.

Let G be a ribbonless graph and G ∗ its anterior graph. Then if G is maximal,so is G ∗ . Proof.

If, for contradiction, G ∗ is not maximal, then Theorem 2 implies that there is aprimitive inducing path in G ∗ between non-adjacent nodes i and j . Consider a shortestprimitive inducing path between i and j and denote it by π . We know that all innernodes of π are colliders in G ∗ . This trivially implies that all inner nodes of π are collidersin G too. In addition, each inner node k on π is in an( { i, j } ) in G ∗ . In G , k ∈ an( { i, j } )unless an arrow on the direction-preserving path from k to i or j is an arc turning intoan arrow in G ∗ . In this case, k is an ancestor of a node that is the endpoint of a line.Hence the tripath h h, k, l i on π is a ribbon unless there is an endpoint-identical hl -edgeto the tripath, which contradicts the fact that π is shortest. Therefore, π is a primitiveinducing path in G , a contradiction. Hence, G ∗ is maximal. (cid:3)

6. Markov properties for ribbonless graphs

In this section, we give a precise deﬁnition of the global and pairwise Markov propertiesfor an independence model J deﬁned over the node set of a ribbonless graph. Further6 K. Sadeghi and S. Lauritzen we show that these two Markov properties are equivalent for a maximal ribbonless graphif J is also a compositional graphoid. This result is a direct generalisation of the similarresult of [21] for undirected graphs and graphoids. For a ribbonless graph G = ( V, E ), an independence model J deﬁned over V satisﬁes the global Markov property w.r.t. G if it holds for A , B , and C disjoint subsets of V that A ⊥ m B | C = ⇒ h A, B | C i ∈ J . Similarly, an independence model J deﬁned over V satisﬁes the pairwise Markov prop-erty w.r.t. G if it holds for any nodes i and j that i j = ⇒ h i, j | (ant( i ) ∪ ant( j )) \ { i, j }i ∈ J . For example, for the graph in Figure 8, the pairwise Markov property would implythat h i, m | { k, l, h }i as ant( i ) = { k, l, h, m } and ant( m ) = { l, h } . It would also imply that h l, p | { h, m }i .Clearly, the independence model J m ( G ) induced by m -separation always satisﬁes theglobal Markov property w.r.t. G . By Lemma 4, Lemma 6, and Theorem 2, J m ( G ) satisﬁesthe pairwise Markov property if and only if G is maximal . Before establishing the main result of this section, we need two lemmas.

Lemma 8.

Let J be a compositional graphoid over a set V and M and C be disjointsubsets of V . It then holds that the marginal independence model α ( J , M ) = {h A, B | C i : h A, B | C i ∈ J and ( A ∪ B ∪ C ) ∩ M = ∅ } , which is deﬁned over V \ M , is a compositional graphoid. Figure 8.

The pairwise Markov property for this RG implies, for example, h i, m | { k, l, h }i . Theglobal Markov property would for example imply h{ i, k } , j | l i . arkov properties for mixed graphs Proof.

All the six compositional graphoid properties for α ( J , M ) follow trivially fromthe facts that for A , B , and C such that ( A ∪ B ∪ C ) ∩ M = ∅ , h A, B | C i ∈ α ( J , M ) ifand only if h A, B | C i ∈ J , and J satisﬁes the six properties. (cid:3) Notice that the notion of a marginal independence model α ( J , M ) is identical to thenotion formally deﬁned in [24] with a diﬀerent notation; it was also discussed in [26] withthe same notation as in this paper.The following lemma gives suﬃcient conditions for the combination of two m -connecting paths in anterior graphs to be m -connecting. Lemma 9.

Let G ∗ be the anterior graph of a ribbonless graph G and suppose that thereare paths π = h i = i , i , . . . , i n , h i between i and h and π = h h, j m , j m − , . . . , j = j i between h and j which are m -connecting given C . The combination π = π ◦ π is thenan m -connecting path between i and j given C in each of the following mutually exclusivesituations: (a1) h i n , h, j m i is a collider and h ∈ C ∪ an( C ) ; (a2) i n = j m with an arrowhead pointing to h on the i n h -edge and h ∈ C ∪ an( C ) ; (b1) h i n , h, j m i is a non-collider and h / ∈ C ; (b2) i n = j m with no arrowhead pointing to h on the i n h -edge. Proof.

Let π = π ◦ π = h i, . . . i p − , k, j q − , . . . , j i be the combination of π and π . If k = h and either (a1) or (b1) holds then the conclusion is obvious. The cases (a2) or (b2)are only relevant when k = h .Next consider the situation where k = h . Since π and π are m -connecting, for π tobe m -connecting we only need to check the tripath h i p − , k, j q − i . We have to deal withtwo cases: Case 1: h i p − , k, j q − i is a non-collider .In this case there is no arrowhead pointing to k from at least one of i p − or j q − . Thismeans that h i p − , k, i p +1 i on π or h j q − , k, j q +1 i on π is a non-collider, and since π and π were both m -connecting we have k / ∈ C . Hence π is m -connecting. Case 2: h i p − , k, j q − i is a collider . We need to consider the following two subcases: Case 2.1 . If h i p − , k, j q − i is a collider and any of h i p − , k, i p +1 i or h j q − , k, j q +1 i isalso a collider then k ∈ C ∪ an( C ) and π is m -connecting. Case 2.2 . If h i p − , k, j q − i is a collider but h i p − , k, i p +1 i and h j q − , k, j q +1 i are bothnon-colliders then by Lemma 5, the subpath of π from k to a collider node l or to h isan anterior path and similarly for π , l , and h . However, since G ∗ is an anterior graphand there are arrowheads pointing to k , these anterior paths must be direction-preservingand thus k ∈ an( l ) ∪ an( h ) and k ∈ an( l ) ∪ an( h ). Now we have the two following furthersubcases: Case 2.2.1: One of the subpaths of π , π from k to l , l is direction-preserving. Because π and π are m -connecting we must have l or l in C ∪ an( C ). Thus, k ∈ an( C ) and π is m -connecting. Case 2.2.2: Both subpaths of π and π from k to h are direction-preserving. Then h i n , h, j m i is collider or i n = j m with an arrowhead pointing to h on the i n h -edge and8 K. Sadeghi and S. Lauritzen (b1) and (b2) are impossible. If (a1) or (a2) holds π is m -connecting since then h ∈ C ∪ an( C ). (cid:3) We are now ready to establish the main result of this paper.

Theorem 3.

Let G be a maximal ribbonless graph. If an independence model J over thenode set of G is a compositional graphoid, then J satisﬁes the pairwise Markov propertyw.r.t. G if and only if it satisﬁes the global Markov property w.r.t. G . Proof. ( ⇐ ) If J is a compositional graphoid and satisﬁes the global Markov propertyit follows from Theorem 2 and Lemma 6 that it satisﬁes the pairwise Markov property.( ⇒ ) Now suppose that J satisﬁes the pairwise Markov property and compositionalgraphoid axioms. For subsets A , B , and C of the node set of G , we should prove that A ⊥ m B | C implies h A, B | C i ∈ J . By composition, it is suﬃcient to show this when A and B are singletons, that is, that i ⊥ m j | C implies h i, j | C i ∈ J .Further we observe that it is suﬃcient to establish the result in the case when G = G ∗ is itself an anterior graph. Proposition 1 gives that A ⊥ m B | C in G , which implies A ⊥ m B | C in G ∗ . In addition, by Lemma 7, G ∗ is a maximal graph. Moreover, G and G ∗ have the same anterior sets, and therefore the same pairwise Markov property. Thusin the following, we assume that G = G ∗ is an anterior graph.We prove the result in two main parts. In part I, we prove the result for the case that C ⊆ ant( i ) ∪ ant( j ). In part II, we use the result of part I to establish the general case. Part I . Suppose that C ⊆ ant( i ) ∪ ant( j ). We use induction on the number of nodes ofthe graph. The induction base for a graph with two nodes is trivial. Thus, suppose thatthe result holds for all anterior graphs with fewer than n nodes and assume that G ∗ has n nodes.Let D = { i } ∪ { j } ∪ ant( i ) ∪ ant( j ) and M = V \ D , where V is the node set of thegraph. First in case I.1 we suppose that M = ∅ , and then in case I.2 we suppose that M = ∅ . Case I.1 . Consider G ∗ [ D ] to be the subgraph induced by D . Consider the marginalindependence model α ( J , M ) = {h A, B | C i : h A, B | C i ∈ J and ( A ∪ B ∪ C ) ∩ M = ∅ } deﬁned over D . By Lemma 8, α ( J , M ) is a compositional graphoid. In addition, itsatisﬁes the pairwise Markov property: This is because two non-adjacent nodes l and l in G ∗ [ D ] are non-adjacent in G ∗ and by the pairwise Markov property for J , h l , l | (ant G ∗ ( l ) ∪ ant G ∗ ( l )) \ { l , l }i ∈ J , where ant G ∗ is the anterior set in G ∗ . We knowthat ant G ∗ ( l ) ∪ ant G ∗ ( l ) ⊆ D and hence ant G ∗ ( l ) ∪ ant G ∗ ( l ) ∩ M = ∅ . In addition, fora node l in G ∗ [ D ], ant G ∗ ( l ) = ant G ∗ [ D ] ( l ). Therefore, h l , l | (ant G ∗ [ D ] ( l ) ∪ ant G ∗ [ D ] ( l )) \{ l , l }i ∈ α ( J , M ).We also know that i ⊥ m j | C in G ∗ implies i ⊥ m j | C in G ∗ [ D ] since there is no m -connecting path between i and j given C in G ∗ and by removing nodes and edges from G ∗ no new m -connecting paths are generated. Therefore, by the induction hypothesis h i, j | C i ∈ α ( J , M ). This implies that h i, j | C i ∈ J . Case I.2 . Now suppose that M = ∅ and thus the node set of G ∗ is D = { i } ∪ { j } ∪ ant( i ) ∪ ant( j ). We prove the result by reverse induction on | C | : For the base, C = arkov properties for mixed graphs V \ { i, j } = (ant( i ) ∪ ant( j )) \ { i, j } and the result follows trivially from the pairwiseMarkov property.For the inductive step, consider a node h / ∈ C . We want to show that h is not simulta-neously m -connected to both i and j : Suppose, for contradiction, there are m -connectingpaths π = h i, i , . . . , i n , h i and π = h h, j m , j m − , . . . , j = j i given C . If (b1) or (b2) ofLemma 9 hold then i and j are m -connected given C which contradicts i ⊥ m j | C . So weneed only consider the cases where h i n , h, j m i is collider or i n = j m with an arrowheadpointing to h on the i n h -edge. However, we know that h ∈ ant( i ) or h ∈ ant( j ). Becauseof symmetry between i and j suppose that h ∈ ant( i ). Since G ∗ is an anterior graphand there is an arrowhead pointing to h we have h ∈ an( i ). Hence, there is a direction-preserving path π from h to i . If no node on π is in C then (b1) or (b2) of Lemma 9implies that the combination of π and π is an m -connecting path between i and j , againa contradiction. If there is a node on π that is in C then h ∈ an( C ) and again, by (a1)and (a2) of Lemma 9, i and j are m -connected given C , again a contradiction.We conclude that, given C , h is not m -connected to both i and j . By symmetry,suppose that i ⊥ m h | C .We also have that i ⊥ m j | C . Since J m ( G ∗ ) is a compositional graphoid (Theorem 1)the composition property gives that i ⊥ m { j, h } | C . By weak union for ⊥ m we obtain i ⊥ m j | { h } ∪ C and i ⊥ m h | { j } ∪ C . By the induction hypothesis, we obtain h i, j |{ h } ∪ C i ∈ J and h i, h | { j } ∪ C i ∈ J . By intersection, we get h i, { j, h } | C i ∈ J . Bydecomposition we ﬁnally obtain h i, j | C i ∈ J . Part II . We now prove the result in the general case by induction on | C | . The base,that is, the case that | C | = 0, follows from part I. To prove the inductive step, we canassume that C * ant( i ) ∪ ant( j ), since otherwise part I implies the result.We ﬁrst show that if C * ant( i ) ∪ ant( j ) then there is a node l in C such that i ⊥ m j | C \{ l } : Let ﬁrst l ′ ∈ C \ (ant( i ) ∪ ant( j )) be arbitrary. If there is an l ′′ ∈ C \ (ant( i ) ∪ ant( j ))so that l ′ ∈ ant( l ′′ ) and l ′′ / ∈ ant( l ′ ) then replace l ′ by l ′′ , and repeat this process until itterminates, the latter being ensured by transitivity of ant (Lemma 1) and the ﬁnitenessof C . Thus, we eventually obtain an l so that if l ∈ ant(˜ l ) for ˜ l ∈ C \ (ant( i ) ∪ ant( j )) thenwe also have ˜ l ∈ ant( l ).Suppose, for contradiction, that there is a shortest m -connecting path π between i and j given C \ { l } . If l is not on π or is a collider on π then π is also m -connecting given C . Therefore, l is a non-collider on π . This, together with l / ∈ ant( i ) ∪ ant( j ), by usingLemma 5, implies that l is an anterior of a collider node p on π . Since π is m -connecting, p ∈ C ∪ an( C ). Thus, there is an ˜ l ∈ C so that p = ˜ l or p ∈ an(˜ l ). Transitivity of anteriorsets and the fact that l / ∈ (ant( i ) ∪ ant( j )) now imply that ˜ l ∈ C \ (ant( i ) ∪ ant( j )). Theconstruction of l implies ˜ l ∈ ant( l ) which again implies that ˜ l ∈ an( l ) and l ∈ an(˜ l ) andthus the collider tripath containing p is a cyclic ribbon unless its endpoints are adjacentwith an endpoint-identical edge, which implies that π is not a shortest m -connectingpath, a contradiction.We now have that either i ⊥ m l | C \ { l } or j ⊥ m l | C \ { l } since otherwise, by Lemma 9there is an m -connecting path between i and j given C \ { l } in the case that l is a non-collider or given C in the case that l is a collider node. Because of symmetry supposethat i ⊥ m l | C \ { l } . By the induction hypothesis, we have h i, j | C \ { l }i ∈ J and h i, l | K. Sadeghi and S. Lauritzen C \ { l }i ∈ J . By the composition property we get h i, { j, l } | C \ { l }i ∈ J . The weak unionproperty implies h i, j | C i ∈ J . (cid:3) If we specialise Theorem 3 to the most common case of probabilistic independencemodels, we get the following corollary.

Corollary 2.

Let G be a maximal ribbonless graph. A probabilistic independence modelthat satisﬁes the intersection and composition axioms satisﬁes the pairwise Markov prop-erty w.r.t. G if and only if it satisﬁes the global Markov property w.r.t. G . Theorem 3 states that, for equivalence of pairwise and global Markov properties, thesix compositional graphoid axioms are suﬃcient. In fact, in general, for the mentionedequivalence, all six axioms are also necessary. The graphs in Figure 9 show that theintersection and composition properties are necessary for the equivalence of pairwise andglobal Markov properties.For G = ( V , E ), if J deﬁned over V satisﬁes the pairwise Markov property, then h i, k | { j, l }i , h i, l | { j, k }i , and h k, l | { i, j }i are in J . It can be seen that none of the com-positional semi-graphoid axioms can be used to imply h i, { k, l } | j i ∈ J . The intersectionproperty is the only axiom that implies the result.For G = ( V , E ), if J deﬁned over V satisﬁes the pairwise Markov property then h i, k | ∅ i , h i, l | ∅ i , and h k, l | ∅ i are in J . It can be seen that none of the graphoidaxioms can be used to imply h i, { k, l } | ∅ i ∈ J . The composition property is the onlyaxiom that implies the result.For G = ( V , E ), if J deﬁned over V satisﬁes the pairwise Markov property then h i, k | ∅ i , h i, l | { j, k }i , and h k, l | { i, j }i are in J . It can be seen that none of the compo-sitional semi-graphoid axioms can be used to imply h l, { i, k } | j i ∈ J . The intersectionproperty is the only axiom that implies the result. See also for example, Example 3.26of [19], showing that the pairwise Markov property does not imply the global Markovproperty for DAGs when intersection is violated.It is known that, for undirected graphs, the ﬁve graphoid axioms are necessary andsuﬃcient for equivalence of pairwise and global Markov properties; see [19]. For bidirected Figure 9.

For the equivalence of pairwise and global Markov properties, (a) an undirectedgraph G that shows that the intersection property is necessary; (b) a bidirected graph G thatshows that the composition property is necessary; (c) a directed acyclic graph G that showsthat the intersection property is necessary. arkov properties for mixed graphs i and j is h i, j | ∅ i and only the ﬁve compositional semi-graphoid axioms are necessary forequivalence of pairwise and global Markov properties. This can be inferred from theproof of Theorem 3, since part I of the proof is not relevant for bidirected graphs unless C = ∅ and the intersection property is not used in part II of the proof. We conclude bystating this as its own proposition. Proposition 2.

Let G = ( V, E ) be a bidirected graph. If an independence model J deﬁnedover V is a compositional semi-graphoid then J satisﬁes the pairwise Markov propertyw.r.t. G if and only if it satisﬁes the global Markov property w.r.t. G . Acknowledgements

We are grateful to Milan Studen´y, Nanny Wermuth, and anonymous referees for veryhelpful comments on earlier versions of this paper.

References [1]

Andersson, S.A. , Madigan, D. and

Perlman, M.D. (1997). A characterization ofMarkov equivalence classes for acyclic digraphs.

Ann. Statist. Andersson, S.A. , Madigan, D. and

Perlman, M.D. (2001). Alternative Markov prop-erties for chain graphs.

Scand. J. Statist. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems.

J. Roy.Statist. Soc. Ser. B Clifford, P. (1990). Markov random ﬁelds in statistics. In

Disorder in Physical Systems . Oxford Sci. Publ.

Cox, D.R. and

Wermuth, N. (1993). Linear dependencies represented by chain graphs.

Statist. Sci. Darroch, J.N. , Lauritzen, S.L. and

Speed, T.P. (1980). Markov ﬁelds and log-linearinteraction models for contingency tables.

Ann. Statist. Dawid, A.P. (1979). Conditional independence in statistical theory.

J. Roy. Statist. Soc.Ser. B Drton, M. (2009). Discrete chain graph models.

Bernoulli Drton, M. and

Richardson, T.S. (2008). Binary models for marginal independence.

J. R. Stat. Soc. Ser. B Stat. Methodol. Frydenberg, M. (1990). The chain graph Markov property.

Scand. J. Statist. Geiger, D. , Verma, T. and

Pearl, J. (1990). Identifying independence in Bayesiannetworks.

Networks Grimmett, G.R. (1973). A theorem about random ﬁelds.

Bull. London Math. Soc. K. Sadeghi and S. Lauritzen [13]

Hammersley, J.M. and

Clifford, P. (1971) Markov ﬁelds on ﬁnite graphs and lattices.Unpublished manuscript.[14]

Kang, C. and

Tian, J. (2009). Markov properties for linear causal models with correlatederrors.

J. Mach. Learn. Res. Kauermann, G. (1996). On a dualization of graphical Gaussian models.

Scand. J. Statist. Kiiveri, H. , Speed, T.P. and

Carlin, J.B. (1984). Recursive causal models.

J. Austral.Math. Soc. Ser. A Koster, J.T.A. (1997). Gibbs and Markov properties of graphs.

Ann. Math. ArtiﬁcialIntelligence Koster, J.T.A. (2002). Marginalizing and conditioning in graphical models.

Bernoulli Lauritzen, S.L. (1996).

Graphical Models . Oxford Statistical Science Series . New York:Clarendon Press. MR1419991[20] Lauritzen, S.L. and

Richardson, T.S. (2002). Chain graph models and their causalinterpretations.

J. R. Stat. Soc. Ser. B Stat. Methodol. Pearl, J. (1988).

Probabilistic Reasoning in Intelligent Systems: Networks of PlausibleInference . The Morgan Kaufmann Series in Representation and Reasoning . San Mateo,CA: Morgan Kaufmann. MR0965765[22]

Richardson, T. (2001) Chain graphs which are maximal ancestral graphs are recursivecausal graphs. Technical Report 387, Dept. Statistics, Univ. Washington, Seattle, WA.[23]

Richardson, T. (2003). Markov properties for acyclic directed mixed graphs.

Scand. J.Statist. Richardson, T. and

Spirtes, P. (2002). Ancestral graph Markov models.

Ann. Statist. Richardson, T.S. (1998). Chain graphs and symmetric associations. In

Learning in Graph-ical Models ( M. Jordan , ed.) 231–260. Dordrecht, The Netherlands: Kluwer.[26]

Sadeghi, K. (2013). Stable mixed graphs.

Bernoulli Sadeghi, K. (2012). Graphical representation of independence structures. Ph.D. thesis,Univ. Oxford.[28]

Spirtes, P. , Richardson, T. and

Meek, C. (1997). The dimensionality of mixed ancestralgraphs. Technical Report CMU-PHIL-83, Dept. Philosophy, Carnegie–Mellon Univ.,Pittsburgh, PA.[29]

Studen´y, M. (1989). Multi-information and the problem of characterization of conditionalindependence relations.

Problems Control Inform. Theory/Problemy Upravlen. Teor.Inform. Wermuth, N. (2011). Probability distributions with summary graph structure.

Bernoulli Wermuth, N. and

Cox, D.R. (1998). On association models deﬁned over independencegraphs.

Bernoulli Wermuth, N. , Marchetti, G.M. and

Cox, D.R. (2009). Triangular systems for sym-metric binary variables.

Electron. J. Stat.932–955. MR2540847