[PDF] Proof Supplement - Learning Sparse Causal Models is not NP-hard (UAI2013)

Abstract

This article contains detailed proofs and additional examples related to the UAI-2013 submission `Learning Sparse Causal Models is not NP-hard'. It describes the FCI+ algorithm: a method for sound and complete causal model discovery in the presence of latent confounders and/or selection bias, that has worst case polynomial complexity of order N 2(k+1) in the number of independence tests, for sparse graphs over N nodes, bounded by node degree k . The algorithm is an adaptation of the well-known FCI algorithm by (Spirtes et al., 2000) that is also sound and complete, but has worst case complexity exponential in N .

Full PDF

SSupplement - Learning Sparse Causal Models is not NP-hard

Tom Claassen ∗ , Joris M. Mooij ‡ , and Tom Heskes ∗∗ Institute for Computer and Information Science, Radboud University Nijmegen ‡ Informatics Institute, University of AmsterdamThe Netherlands

Abstract

This article contains detailed proofs and ad-ditional examples related to the UAI-2013submission ‘Learning Sparse Causal Modelsis not NP-hard’. The supplement follows thenumbering in the main submission.

For reference purposes a few basic graphical modelconcepts, terms and deﬁnitions. For details, see e.g.,Richardson and Spirtes (2002). A mixed graph G is a graphical model that can containthree types of edges between pairs of nodes: directed( ← , → ), bidirected ( ↔ ), and undirected ( − ). In thispaper we only consider graphs with at most one edgebetween each pair of nodes, and with no node with anedge to itself. If there is an edge X → Y in G then X is a parent of its child Y , if X ↔ Y then X and Y are spouses of each other, and if X − Y then they arecalled neighbours . A path π = (cid:104) X , X , . . . , X n (cid:105) is anordered sequence of distinct nodes where each succes-sive pair ( X i , X i +1 ) along π is adjacent (connected byan edge) in G . A directed path is a path of the form X → X → . . . → X n . A directed cycle is a directedpath from X to X n in combination with a directededge X n → X . A directed acyclic graph (DAG) isa graph that contains only directed edges, but has nodirected cycle. The skeleton S of a graph G is the undi-rected graph corresponding to the structure of M , sothat for each edge in G there is a undirected edge in S . A node X is an ancestor of Y (and Y a descen-dant of X ) if there is a directed path from X to Y in G , or X = Y . A vertex Z is a collider on a path π = (cid:104) . . . , X, Z, Y, . . . (cid:105) if there are arrowheads at Z onboth edges from X and Y , i.e., if X ∗→ Z ←∗ Y (where the symbol ∗ stands for either an arrowhead mark ora tail mark), otherwise it is a noncollider . A trek is apath without colliders.In a DAG G , a path π = (cid:104) X, . . . , Y (cid:105) is said to be unblocked relative to a set of vertices Z , if and only if:(1) every noncollider on π is not in Z , and(2) every collider along π is an ancestor of Z ,otherwise the path is blocked . We say that a path π = (cid:104) X, . . . , Y (cid:105) is blocked by node Z ∈ Z iﬀ π is blockedgiven Z , but unblocked relative to Z \ Z . If there existsan unblocked path between X and Y relative to Z in G then X and Y are said to be d-connected given Z ;if there is no such path then X and Y are d-separated by Z .A mixed graph M is an ancestral graph (AG) iﬀ anarrowhead at X on an edge to Y implies that thereis no directed path from X to Y in M , and there areno arrowheads at nodes with undirected edges. As aresult, arrowhead marks can be read as ‘is not an an-cestor of’, and all DAGs are ancestral. In an ancestralgraph M a node X is said to be anterior to a node Y if there is a so-called anterior path from X to Y in M of the form X − . . . − ( Z ) → . . . → Y , possi-bly with Z = X (no undirected part) or with Z = Y (no directed part), or if X = Y . Arrowhead marksin an ancestral graph can therefore also be read as ‘isnot anterior to’. When applied to an ancestral graph d -separation is also known as m -separation. An an-cestral graph is maximal (MAG) if for any two non-adjacent vertices there is a set that separates them.A path π between two nodes ( X, Y ) in an ancestralgraph is inducing with respect to a set of nodes Z iﬀevery collider on π is ancestor of X or Y , and everynoncollider is in Z . Inducing paths w.r.t Z = ∅ arecalled primitive .Throughout the rest of this article, X , Y and Z rep- Sometimes m -connected is deﬁned using ‘anterior’ in-stead of ‘ancestor’ in condition (2) of ‘unblocked’, but ascolliders have no undirected edges these two are equivalent. a r X i v : . [ s t a t . M L ] N ov esent disjoint (subsets of) nodes (vertices, variables)in a graph, with sets denoted in boldface. The set Adj ( X ) refers to the nodes adjacent to X in an AG M , An ( X ) represents the ancestors of X in M , and Ant ( X ) the nodes anterior to X in M . Similar forsets, i.e., X ∈ Adj ( Z ) implies ∃ Z ∈ Z : X ∈ Adj ( Z );idem for An ( Z ) and Ant ( Z ).Every (maximal) ancestral graph M over nodes V cor-responds to some underlying causal DAG G over vari-ables V ∪ L ∪ S , where the (possibly empty) sets ofunobserved latent variables L and selection nodes S in G have been marginalized and conditioned out, see(Richardson and Spirtes, 2002). We denote the ances-tors of W ⊆ V ∪ L ∪ S in G as An G ( W ), where thesubscript G highlights that the ancestorship relation iswith respect to the underlying DAG G instead of M .Pairs of nodes in V that share a common ancestor inthe subgraph of G over L are said to be confounded .Nodes in An G ( S ) are said to be subject to selectionbias . Both confounding and selection bias can give riseto links between nodes in a MAG, where confoundingis associated with bidirected edges and selection biaswith undirected edges.The following helpful properties for reading ancestralinformation from a (M)AG M corresponding to anunderlying causal DAG G are shown in (Richardsonand Spirtes, 2002):(1) X −−∗ Y ∈ M ⇒ X ∈ An G ( Y ∪ S ),(2) X ←∗ Y ∈ M ⇒ X / ∈ An G ( Y ∪ S ),(3) X ←→ Y ∈ M ⇒ { X, Y } ⊂ De G ( L ),(4) X −− Y ∈ M ⇒ { X, Y } ⊂ An G ( S ).Conversely, a node subject to selection bias in G has noarrowhead in M , and a node not subject to selectionbias is not part of an undirected edge in M . Finally,note that X ∈ An G ( Y ∪ S ) ⇐⇒ X ∈ Ant ( Y ) for all X ∈ M , Y ⊆ M .The following deﬁnition is a special case of the deﬁni-tion in section 4.2.1 of (Richardson and Spirtes, 2002).Given an ancestral graph M over nodes V ∪ L , the marginal MAG M (cid:48) over nodes V has the followingedges: X, Y ∈ V are adjacent in M (cid:48) if there does notexist a set Z ⊆ V \ { X, Y } that m -separates X, Y in M , and in that case the edge in M (cid:48) has an arrow-head at X if and only if X (cid:54)∈ Ant M ( Y ), and has anarrowhead at Y if and only if Y (cid:54)∈ Ant M ( X ). We rely on the following connection betweenin/dependences and (non-)ancestorship in an ancestralgraph.

Lemma 2.

For disjoint (subsets of) nodes

X, Y, Z, Z in an ancestral graph M ,(1) X ⊥⊥ (cid:30) Y | Z ∪ [ Z ] ⇒ Z / ∈ Ant ( { X, Y } ∪ Z ).(2) X ⊥⊥ Y | Z ∪ [ Z ] ⇒ Z ∈ Ant ( { X, Y } ∪ Z ).(3) X ⊥⊥ Y | [ Z ∪ Z ] ⇒ Z ∈ Ant ( { X, Y } ),where square brackets indicate a minimal set of nodes. Proof.

See e.g., Corollary to Lemma 14 in (Spirteset al., 1999), and Lemma 2 in (Claassen and Heskes,2011).We use the following result on anteriorship for nodeson unblocked paths:

Lemma 3.13

In an ancestral graph M , if π is a path m -connecting X and Y given Z , then every vertex on π is in Ant ( { X, Y } ∪ Z ). Proof.

See 3.13 in (Richardson and Spirtes, 2002).As a result, rule (3) in Lemma 2 not only applies tothe nodes in the minimal separating set, but also to all other nodes on the paths in M between X and Y that become unblocked given only a subset Z (cid:48) ⊂ Z . Corollary 8.

Let M be an ancestral graph, and sup-pose that X ⊥⊥ Y | [ Z ]. If a path π in M between X and Y is unblocked given some subset Z (cid:48) ⊂ Z , then allnodes on π are in Ant ( { X, Y } ). Proof.

Follows from Lemma 3.13, given that Lemma2 rule (3) ensures that Z (cid:48) ⊂ Z ⊆ Ant ( { X, Y } ).Similarly, we can freely add anterior nodes to any sep-arating set without introducing a dependence: Corollary 9.

In an ancestral graph M , if X ⊥⊥ Y | Z ,then ∀ W ⊆ Ant ( { X, Y } ∪ Z ) \{ X,Y } : X ⊥⊥ Y | Z ∪ W . Proof.

Adding the nodes in W to the separating setone by one, then by rule (1) in Lemma 2, any node thatcreates a dependence cannot be anterior to any node in { X, Y } ∪ Z , contrary the assumed. So all added nodesleave the original independence intact, and therefore X ⊥⊥ Y | Z ∪ W . D -separating sets This part contains the proofs for section § D -separation: Deﬁnition 2.

In a MAG M , two nodes X and Y are D -separated by a set of nodes Z iﬀ:1. X ⊥⊥ Y | Z ,. ∀ Z (cid:48) ⊆ Adj ( { X, Y } ) \{ X,Y } : X ⊥⊥ (cid:30) Y | Z (cid:48) .If Z D -separates X and Y , then ( X, Y ) is called a D -sep link , and a node Z ∈ Z is called a D -sep node for ( X, Y ) if:1.

Z / ∈ Adj ( { X, Y } ),2. ∀ Z (cid:48) ⊆ Adj ( { X, Y } ) : X ⊥⊥ (cid:30) Y | Z \ Z ∪ Z (cid:48) .In words: X and Y are D -separated by Z iﬀ they are d -separated by Z , and all sets that can separate X and Y contain at least one node Z / ∈ Adj ( { X, Y } ). Such anode Z ∈ Z that cannot be made redundant by nodesadjacent to X or Y is a D -sep node, and the relationbetween X and Y is called a D -sep link. D -sep links To prove Lemma 3 from the main article we ﬁrst de-rive a connection between ‘not separable by adjacentnodes’ and non-anteriorship:

Lemma 10.

In an ancestral graph M , if X ⊥⊥ Y | Z ,but X is not independent of Y given any subset of Adj ( X ) in M , then Y / ∈ Ant ( X ) and Y is not part ofan undirected edge. Proof.

From X ⊥⊥ Y | Z there is no edge between X and Y in M . Let W = Adj ( X ) ∩ Ant ( { X, Y } ) bethe set of all nodes in M that are adjacent to X andhave an anterior path to X and/or Y . According to theassumed then X ⊥⊥ (cid:30) Y | W , and so there are one or moreunblocked paths of the form (cid:104) X, U, . . . , Y (cid:105) relative to W in M (as there is no direct edge). By Lemma 3.13we know that implies U ∈ Ant ( { X, Y } ∪ W ). From W ⊂ Ant ( { X, Y } ) and transitivity of ‘anteriorship’then follows U ∈ Ant ( { X, Y } , which combined withthe fact that U is adjacent to X implies U ∈ W .But given that path π = (cid:104) X, U, . . . , Y (cid:105) is unblockedrelative to W , node U ∈ W must be a collider alongthis path with arrowhead X ∗→ U in M . This means U / ∈ Ant ( X ), which leaves U ∈ Ant ( Y ). But thenalso Y / ∈ Ant ( X ), otherwise (again by transitivity) U would still be anterior to X in M . From the factthat U is collider along π we know that it is not partof an undirected edge, and so Y as descendant of U also cannot be part of an undirected edge in M .This also applies directly to D -sep links. Lemma 3.

In a MAG M , if two nodes X and Y are D -separated by a minimal set Z , then(1) X / ∈ Ant ( Y ∪ Z ),(2) Y / ∈ Ant ( X ∪ Z ),(3) ∀ Z ∈ Z : Z ∈ Ant ( { X, Y } ),(4) X and Y are not part of an undirected edge. Proof. (1) from the deﬁnition of D -separated nodesand Lemma 10 follows that X is not anterior to Y ; but X also cannot be anterior to any node in Z , otherwiseby (3) and transitivity/acyclicity it would either stillbe anterior to Y , or it would by anterior to itself whichwould imply a directed cycle (given that (4) impliesthere cannot be an undirected edge at X ); therefore X / ∈ Ant ( Y ∪ Z );(2) idem for Y ;(3) Lemma 2 rule (3), given that Z is minimal.(4) follows directly from Lemma 10.Note that it is possible that one or more nodes in Z (including D -sep nodes) are part of an undirected edgein M .Next we introduce: Deﬁnition 3.

For a set of nodes X in an ancestralgraph M , the set AA ( X ) ( adjacent anteriors ) is de-ﬁned as AA ( X ) = ( Adj ( X ) ∩ Ant ( X )) \ X .In the context of D -sep links ( X, Y ) we usually refer to AA ( { X, Y } ) as the set of adjacent ancestors , as then Ant ( { X, Y } ) = An ( { X, Y } ), by Lemma 3-(4).With this we can bring D -separation in standard form: Lemma 11.

In a MAG M , if two nodes X and Y are d -separated by Z , then also X ⊥⊥ Y | [ Z AA ∪ Z DS ],with Z DS ⊂ Z , Z AA ⊆ AA ( { X, Y } ), Z AA ∩ Z DS = ∅ ,and where all nodes in Z DS (possibly empty) are D -sep nodes for ( X, Y ). Proof.

We use rules (1)-(3) in Lemma 2 to constructthe two sets. First we remove nodes from Z one-by-one until no more can be removed to obtain a minimal X ⊥⊥ Y | [ Z (cid:48) ], with Z (cid:48) ⊆ Z . By rule (3), all nodesin Z (cid:48) are anterior to X and/or Y . By Corollary 9we obtain X ⊥⊥ Y | AA ( { X, Y } ) ∪ Z (cid:48)(cid:48) , where Z (cid:48)(cid:48) = Z (cid:48) \ AA ( { X, Y } ) contains the subset of nodes from Z (cid:48) thatare not adjacent to X and/or Y .We obtain Z DS by eliminating nodes from Z (cid:48)(cid:48) oneby one until no more nodes can be eliminatedwithout destroying the independence, and so then X ⊥⊥ Y | AA ( { X, Y } ) ∪ [ Z DS ]. If ( X, Y ) is not a D -seplink, then Z DS = ∅ . Finally we can obtain Z AA byeliminating superﬂuous nodes from AA ( { X, Y } ) oneby one until no more can be removed without creatinga dependence.By construction the sets Z AA and Z DS are disjoint.No additional nodes from Z DS can be eliminatedduring/after the process of eliminating nodes from AA ( { X, Y } ): if Z ∈ Z DS can be eliminated only af-ter some node Z A ∈ AA ( { X, Y } ) is eliminated, thenutting back Z A after Z is removed should create adependence, in contradiction with Corollary 9. There-fore, at that point the D -separating set is minimal,i.e., X ⊥⊥ Y | [ Z AA ∪ Z DS ].All nodes in Z DS (if nonempty) satisfy the deﬁnitionof D -sep node: by construction none of them are ad-jacent to X or Y , and if there were some subset W ⊆ Adj ( { X, Y } ) that could make a node Z ∈ Z DS redun-dant then, by Lemma 2.(2), that subset W must be asubset of AA ( { X, Y } ), and so by Corollary 9 the inde-pendence should also be found given AA ( { X, Y } ) ∪ Z (cid:48)(cid:48)(cid:48) with Z (cid:48)(cid:48)(cid:48) ⊆ Z DS \ { Z } : a contradiction.Note that neither Z AA nor Z DS need be uniquely de-ﬁned for a given D -separated X ⊥⊥ Y | [ Z ], but maydepend on the order in which nodes are removed. U U V YX V Z Z Z U U V YX V Z Z X U Z Z V Y

X U U U k W Z V YQ RV m T !! Figure 1:

Path conﬁguration for D -sep link X − Y . In the proof of Lemma 4 we rely on the fact that foreach D -sep link there is a path blocked by a D -sepnode of the form depicted in Figure 1, which imposessix identiﬁable minimal dependence relations in (4)-(6), below: Lemma 12.

In a MAG M , if nodes X and Y are D -separable, then there are nodes { U, V, W, T } ∈ An ( { X, Y } ) such that:(1) X ←→ U and V ←→ Y in M ,(2) U ∈ An ( Y ) and V ∈ An ( X ),(3) U / ∈ An ( V ) and V / ∈ An ( U ),(4) W / ∈ Adj ( X ) and ∀ Z XW with X ⊥⊥ W | [ Z XW ]: X ⊥⊥ (cid:30) W | Z XW ∪ [ U ] and X ⊥⊥ (cid:30) W | Z XW ∪ [ Y ],(5) T / ∈ Adj ( Y ) and ∀ Z Y T with Y ⊥⊥ T | [ Z Y T ]: Y ⊥⊥ (cid:30) T | Z Y T ∪ [ V ] and Y ⊥⊥ (cid:30) T | Z Y T ∪ [ X ],(6) U / ∈ Adj ( V ) in M , and ∀ Z UV , U ⊥⊥ V | [ Z UV ]: U ⊥⊥ (cid:30) V | Z UV ∪ [ X ] and U ⊥⊥ (cid:30) V | Z UV ∪ [ Y ]. Proof.

By Lemma 11 we have X ⊥⊥ Y | [ Z AA ∪ Z DS ],with Z AA ⊆ AA ( { X, Y } ), and Z DS a (sub)set of D -sep nodes not adjacent to X and/or Y . Let Z ∈ Z DS and deﬁne Z := AA ( { X, Y } ) ∪ Z DS . Then, X ⊥⊥ Y | Z but X ⊥⊥ (cid:30) Y | Z \ Z , and so there must be a path π that is(only) blocked by noncollider Z (relative to the other Z \ Z ).We now show that we can take this path to be ofthe form π = X ↔ U ( ↔ U ↔ .. ↔ U k ) ←∗ W · · · Z · · · T ∗→ ( V m ↔ .. ↔ V ↔ ) V ↔ Y in M ,where all nodes U i are colliders along π and are ad-jacent to X , but only U has a bidirected edge to X (similar for V i at Y ), and W is the ﬁrst node along π starting from X that is not adjacent to X (possibly W = Z ), and similarly for T . See also Figure 1.Firstly, all paths between X and Y blocked by a node Z ∈ Z DS must be into both X and Y : given that thereare no undirected edges to X and/or Y in M (Lemma3), then by Corollary 8 the ﬁrst node U encounteredalong any such path must be in An ( { X, Y } ). But ifthis path starts with a tail from X then necessarily X −→ U , so that U ∈ An ( Y ), which in turn implies X ∈ An ( Y ), in contradiction with Lemma 3. Idem for Y . Therefore all paths blocked by node Z , including π , must have X ←∗ . . . ∗→ Y .Secondly, all paths between X and Y blocked by anode Z ∈ Z DS must go via at least two other nodes U ∈ Adj ( X ) resp. V ∈ Adj ( Y ), as Z is presumed tobe not adjacent to X and/or Y . As both these nodessatisfy the criteria for AA ( { X, Y } ) they are part ofthe conditioning set Z , and so they must be collidersalong π (otherwise Z was not needed to block it). Thesame holds for all subsequent nodes up to U k and V m along π that are adjacent to X and/or Y . Thereforethe path π blocked by Z must have the general form π = X ↔ U ( ↔ . . . ↔ U (cid:48) ) ←∗ . . . ∗→ ( V (cid:48) ↔ . . . ↔ ) V ↔ Y .Next, starting from X , at some point along π the ﬁrstnode W must be encountered that is not adjacent to X (possibly W = Z ). Take U as the ﬁrst node en-countered along π with a bidirected edge to X whenstarting from W in the direction of X . Then all other,up to U k nodes between U and W are colliders along π with a directed edge into X (again by Lemma 3 andCorollary 8). Similar for some nodes T and V for Y .Therefore there exists a path blocked by Z of the form π = X ↔ U ( ↔ U ↔ . . . ↔ U k ) ←∗ W · · · Z · · · T ∗→ ( V m ↔ . . . ↔ V ↔ ) V ↔ Y in M , as indicated in Fig-ure 1, with Z as noncollider along the path. Note that W ∈ An ( { X, Y } ) (and similarly for T ): if W −→ U k on π this is immediate, and if W ←→ U k on π , then W isnot part of an undirected edge, which in combinationwith Corollary 8 leads to W ∈ An ( { X, Y } ).ith generic path π we can prove statements (1)-(6),equating U with U and V with V :(1) By construction, we have X ↔ U and V ↔ Y along π .(2) By Corollary 8, all nodes along π are in Ant ( { X, Y } ). As both U and V are colliders along π this reduces to { U , V } ⊂ An ( { X, Y } ). The bidi-rected edge X ↔ U implies U / ∈ An ( X ), and so: U ∈ An ( Y ); vice versa for V ∈ An ( X ).(3) If U ∈ An ( V ), then V ∈ An ( X ) and transitivitywould imply U ∈ An ( X ), contrary the bidirected edge X ↔ U , and so U / ∈ An ( V ); idem for V and Y .(4) For the in/dependence relations on π : given that X and W are not adjacent, they are separated bysome minimal set Z XW (not to be confused with Z AA or Z DS ). By construction, all { U , . . . , U k } are partof this set: U k is needed to block the path X ← U k ← ∗ W . Conditioning on U k unblocks the path X ← U k − ↔ U k ←∗ W so U k − is also needed, etc.,all the way up to and including U (but not U ). Asthis holds for any (minimal) set Z XW that can sep-arate X and W , it means there are unblocked paths into U from both X and W given Z XW , and so thenconditioning on U will make X and W dependent,i.e., X ⊥⊥ (cid:30) W | Z XW ∪ [ U ]. As Y is a descendant of U ,it also implies X ⊥⊥ (cid:30) W | Z XW ∪ [ Y ].(5) Idem Y ⊥⊥ (cid:30) T | Z Y T ∪ [ V ] and Y ⊥⊥ (cid:30) T | Z Y T ∪ [ X ].(6) Finally, U and V cannot be adjacent in M : theycannot be connected by a bidirected edge, for thatwould make the path (cid:104) X, U , V , Y (cid:105) unblocked given Z ;by (3) they cannot be connected by an edge U → V or U ← V ; and they cannot be connected by anundirected edge because they are both colliders along π . Therefore U and V are conditionally independentgiven some minimal set Z UV . For any such minimalseparating set Z UV , no descendant of U or V (includ-ing X and Y ) can be part of it, for that would implyeither U or V was ancestor of the other. Including X or Y in the conditioning set would make them depen-dent given that both X and Y have unblocked pathsto U and V given Z UV . Therefore, we can ﬁnd both U ⊥⊥ (cid:30) V | Z UV ∪ [ X ] and U ⊥⊥ (cid:30) V | Z UV ∪ [ Y ].By Lemma 2, rule (1), each node in Lemma 12 thatdestroys one of the three independences cannot be an-terior to any node in that independence, and so leadsto identiﬁable invariant edge-marks (arrowheads).To make this more precise we ﬁrst introduce the fol-lowing deﬁnitions: Deﬁnition 4. A minimal independence set I ( M )is a set of minimal independencies consistent with aMAG M . It is called a minimal independence model if it contains at least one separating set foreach pair of nonadjacent nodes in the MAG M .The skeleton S implied by a minimal independence set I ( M ) corresponds to the undirected graph with noedges between any ( X, Y ) : X ⊥⊥ Y | [ Z ] ∈ I ( M ). Notethat a minimal independence model I ( M ) uniquelyidentiﬁes the Markov equivalence class of M . Deﬁnition 5.

Let S be the skeleton implied by aminimal independence set I ( M ). Then the Aug-mented Skeleton S + is obtained by adding invari-ant arrowheads at all nodes W on edges to { X, Y } ∪ Z in S that create a single node minimal dependence X ⊥⊥ (cid:30) Y | Z ∪ [ W ], for all X ⊥⊥ Y | [ Z ] ∈ I ( M ).Augmentation boils down to repeated application ofLemma 2, rule (1).From now on we assume that I ( S ) represents a mini-mal independence set as output by the PC algorithmwith possible addition of one or more D -separatingsets, consistent with a MAG M . We also assume thatwe can query an independence oracle for the subse-quent dependencies. This implies that the correspond-ing skeleton S matches the skeleton of M , except thatit may contain zero, one, or more additional (undi-rected) edges that all correspond to D -sep links in M .For D -sep links in the corresponding augmented skele-ton S + this leads to the following pattern: Lemma 4.

Let S + be the augmented skeleton ob-tained from a minimal independence set I ( S ) consis-tent with a MAG M , such that the only additionaledges in S + that do not correspond with an edge in M are D -sep links. Let ( X, Y ) be an edge in S + corre-sponding to a D -sep link in the MAG M . If there areno (additional) edges in S + between other D -separablepairs of nodes in An ( { X, Y } ), then S + contains thefollowing pattern:(1) U ↔ X ↔ Y ↔ V in S + ,(2) U and V not adjacent in S + ,(3) paths V · · · → X and U · · · → Y that do notcontain arrowheads in the direction of V , resp. U . Proof.

Follows from Lemma 12.(1) As X and U are adjacent in M , they are also ad-jacent in S + . Similarly for Y and V . Nodes X and Y are also (still) presumed to be adjacent in S + . Theassumption ‘no edges in S + between other D -sep linksin An ( { X, Y } )’ ensures that the three non-adjacencies(4)-(6) in Lemma 12 are present in I ( M ); the six sub-sequent dependences in Lemma 12 are found by theaugmentation procedure, each time adding arrowheadso the corresponding edge. Ultimately this means that S + contains the invariant pattern: U ↔ X ↔ Y ↔ V .(2) In particular (6) in Lemma 12 ensures that U and V are not adjacent in M . The assumption ‘no edges in S + between other D -sep links in An ( { X, Y } )’ ensuresthat ( U, V ) are not adjacent in S + either.(3) As V ∈ An ( X ) there has to be a path from V in S + that can be(come) oriented as a directed path into X . This means the augmentation procedure cannotadd an invariant arrowhead in the opposite direction;idem for U ∈ An ( Y ).The following result generalizes Lemma 5 in the orig-inal article as a minimal separating set X ⊥⊥ Y | [ Z ]automatically implies Z ⊂ Ant ( { X, Y } ) \{ X,Y } . Lemma 5.

Let (

X, Y ) and (

U, V ) be two possi-bly overlapping but nonidentical pairs of D -separablenodes in a MAG M . If { X, Y } ⊂ An ( { U, V } ), then { U, V } (cid:42) An ( { X, Y } ). Proof.

Suppose U ∈ An ( X ). If U (cid:54) = X then bythe given and acylicity X ∈ An ( V ), which by tran-sitivity implies U ∈ An ( V ), contrary Lemma 3 rule(1). Idem for U ∈ An ( Y ). So either U ∈ { X, Y } or U / ∈ An ( { X, Y } ). Idem for V . But if both U ∈ { X, Y } and V ∈ { X, Y } , with U (cid:54) = V in a D -sep link, thenthe two D -separable pairs would be identical. There-fore at least one is not ancestor of { X, Y } , and so { U, V } (cid:42) An ( { X, Y } ).So two D -separable node pairs cannot both be presentin each others D -separating set. In fact, the ancestorrelation induces a partial order over the D -sep links: Lemma 13.

Let Φ = (cid:8) { U , V } , .., { U n , V n } (cid:9) be aset of distinct (but not necessarily disjoint) D -seplinks in a MAG M . Then the relation { U i , V i } (cid:22){ U j , V j } ⇐⇒ { U i , V i } ⊆ An ( { U j , V j } ) deﬁnes a par-tial order over Φ. Proof.

For all { U i , V i } , { U j , V j } , { U k , V k } ∈ Φ:1. Reﬂexivity: ( { U i , V i } (cid:22) { U i , V i } ) is trivial.2. Antisymmetry: (if { U i , V i } (cid:22) { U j , V j } and { U j , V j } (cid:22) { U i , V i } then { U i , V i } = { U j , V j } ) fol-lows from Lemma 5.3. Transitivity: (if { U i , V i } (cid:22) { U j , V j } and { U j , V j } (cid:22) { U k , V k } , then { U i , V i } (cid:22) { U k , V k } )follows from transitivity of the ancestor relation-ship of nodes in a MAG.This implies the relation (cid:22) satisﬁes the conditions ofa partial order over the elements in Φ. As a result, in every non-empty (sub)set of D -separable node pairs there is at least one pair thatdoes not have both nodes of any of the other pairs inits ancestors: Lemma 14.

If Φ = (cid:8) { U , V } , .., { U n , V n } (cid:9) is a non-empty set of distinct (but not necessarily disjoint) D -sep links in a MAG M , then there is a { U i , V i } ∈ Φsuch that ∀ j (cid:54) = i : { U j , V j } (cid:42) Ant ( { U i , V i } ). Proof.

By Lemma 3 rule (4), D -sep nodes are not partof an undirected edge, so the statement reduces to ∀ j (cid:54) = i : { U j , V j } (cid:42) An ( { U i , V i } ). In terms of thepartial order deﬁned in Lemma 13 this is equivalentto stating that there exists a minimal element withrespect to (cid:22) , i.e., an element { U j , V j } ∈ Φ such thatthere is no other element { U i , V i } ∈ Φ (with i (cid:54) = j ) thatprecedes it, i.e., such that { U i , V i } (cid:22) { U j , V j } . As anyﬁnite partially ordered set has at least one minimalelement, this proves the lemma.This means that if there are still one or more unidenti-ﬁed D -sep links in the augmented skeleton S + , then atleast one of these has no unidentiﬁed D -sep links be-tween any two of its ancestors, and so for that D -seplink the bidirected edge pattern of Lemma 4 is guar-anteed to appear in S + . Therefore we can employ thefollowing search strategy to check for D -sep links. Lemma 15.

In a MAG M , all D -sep links can befound by repeatedly (and exclusively) checking an aug-mented skeleton S + for edges that appear as the mid-dle link of the bidirected triple from Lemma 4, whileupdating S + for each D -sep link found. Proof.

Let S be the skeleton of M , possibly with ad-ditional edges in S that all correspond to D -sep linksin M , (e.g. as obtained from the PC-search stage inthe FCI algorithm). Let S + be the augmented skele-ton of S w.r.t. minimal (in)dependencies implied by M . Then, as long as there are one or more edges in S + that are not in M , then by Lemma 14 at least oneof these edges will have no unidentiﬁed D -sep links(edges in S that are not in M ) between its ancestors,and so by Lemma 4 this D -sep link will show up in S + as the middle edge of the bidirected triple. Givena procedure to establish whether or not a candidateedge satisfying the bidirected pattern is a D -sep link(e.g., FCI’s Possible-D-SEP search), then testing allcandidate edges, while updating S + for each D -seplink identiﬁed (remove edge and compute arrowheadsfor new bidirected triples) until no more can be found,is guaranteed to ﬁnd all D -sep links. This means thatat the end the skeleton of S + matches that of the MAG M , and all arrowheads in S + are also in M .his greatly improves the practical running speed ofFCI, as often no or hardly any edges need to be checked(after the augmented skeleton has been constructed),but in itself it is not suﬃcient to guarantee a reductionof the overall complexity to polynomial time, as evena single edge may still require searching through allsubsets of order N nodes. The next section shows howa diﬀerent search strategy can resolve this problem. D -sep nodes In proving some of the Lemmas below we often con-sider marginal MAGs, i.e. MAGs M (cid:48) obtained bymarginalizing out one or more nodes from a base MAG M in accordance with the rules in (Richardson andSpirtes, 2002) (see also section 1.1 for a deﬁnition).First some properties of unblocked paths in an ances-tral graph relative to the adjacent ancestors of D -seplink { X, Y } , used in the proof of Lemma 6.All paths ultimately blocked by one or more of the D -sep nodes are unblocked relative to AA ( { X, Y } ). Lemma 16.

In a MAG M , if X ⊥⊥ Y | AA ( { X, Y } ) ∪ Z with Z ⊂ Ant ( { X, Y } ), and π is a path between X and Y that is unblockedrelative to AA ( { X, Y } ) ∪ Z (cid:48) for Z (cid:48) ⊂ Z , then:(1) all colliders on π are in An ( AA ( { X, Y } ));(2) π is unblocked given AA ( { X, Y } ) ∪ Z ∗ , ∀ Z ∗ ⊆ Z (cid:48) . Proof. (1) A path π is unblocked relative to a set Z iﬀevery noncollider along π is not in Z , and every collideron π is ancestor of some node in Z . If every noncollideralong π is not present in AA ( { X, Y } ) ∪ Z (cid:48) then they arealso not present for a subset Z ∗ ⊆ Z (cid:48) . Furthermore,every node Z ∈ Z (cid:48) that is a descendant of some collideralong π is in An ( { X, Y } ) (given Z ⊂ Ant ( { X, Y } ) andthe arrowhead at Z as descendant of the collider), andso has a directed path to { X, Y } . This directed pathgoes via penultimate node U ∈ AA ( { X, Y } ), and soit follows that Z , and so by transitivity the collidersalong π as well, are ancestor of a node in AA ( { X, Y } ).(2) Therefore π remains unblocked relative to AA ( { X, Y } ) in combination with any subset Z ∗ ⊂ Z (cid:48) ⊂ Ant ( { X, Y } ), including Z ∗ = ∅ .Also, paths blocked by D -sep nodes correspond to se-quences of bidirected edges in marginal MAGs. Lemma 17.

In a MAG M with D -separable X and Y , if X ⊥⊥ Y | AA ( { X, Y } ) ∪ Z ∪ [ Z ] with Z ⊂ Ant ( { X, Y } ), then a path π between X and Y in M that is unblocked relative to AA ( { X, Y } ) ∪ Z corre-sponds to a sequence of three or more bidirected edgesconnecting X and Y in all marginal MAGs M (cid:48) over { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) , with Z (cid:48) ⊆ Z . Proof.

Below we ﬁrst construct a sequence of un-blocked treks in the MAG M between nodes in { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) that connects X and Y ( steps 1-3 ). Then we map this sequence to the bidi-rected edge path in the marginal MAG M (cid:48) ( steps 4-6 ).Let π be a path in M between D -separable ( X, Y ) thatis unblocked relative to AA ( { X, Y } ) ∪ Z . Step 1: map π to sequence σ U of unblocked treks in M . Let U , . . . , U m be the colliders in M along the path π blocked by Z . By Lemma 16-(1), all colliders U i ∈ An ( AA ( { X, Y } )). Using similar reasoning as inthe beginning of the proof of Lemma 12, the path π blocked by Z (which is nonadjacent to X and Y ) mustbe of the form X ↔ U ←∗ . . . ∗→ U m ↔ Y . Eachsuccessive pair of colliders ( U i , U i +1 ) along unblocked π must be connected by a trek (possibly a single edge ↔ ) that does not contain any node in AA ( { X, Y } ) ∪ Z ,and so σ U = [ X, U , . . . , U m , Y ] corresponds to a se-quence of treks connecting X and Y in M that areunblocked relative to any subset of AA ( { X, Y } ) ∪ Z . Step 2: map σ U to sequence σ V of unblocked treks be-tween nodes in { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) . By Lemma 16 the path π in M is also unblockedgiven AA ( { X, Y } ) ∪ Z (cid:48) , for any subset Z (cid:48) ⊆ Z , andeach collider U i along π is in An ( AA ( { X, Y } )). Foreach U i let V i be the ﬁrst descendant of U i in M that is in AA ( { X, Y } ) ∪ Z (cid:48) (possibly U i = V i ; inparticular, U = V and U m = V m ). If there aretwo or more such descendants (along diﬀerent paths)then simply pick one of these at random. As a re-sult, in M there are treks between V i and V i +1 , andeach such trek is again unblocked given any subsetof AA ( { X, Y } ) ∪ Z (cid:48) . Note that the concatenation ofthe three treks V i ←− . . . ←− U i , U i ←∗ . . . ∗→ U i +1 , U i +1 −→ . . . −→ V i +1 is not necessarily a trek, as anode may occur more than once. This can be reme-died by taking a “shortcut” via that node. Note thatthis node cannot become a collider, as at least one ofthe occurrences of that node must be on one of thedirected paths V i ←− . . . ←− U i or U i +1 −→ . . . −→ V i +1 (because U i ←∗ . . . ∗→ U i +1 is a trek), and that meansthat at least one of the edges at that node will have atail. The result is a trek in M between V i and V i +1 that is unblocked given any subset of AA ( { X, Y } ) ∪ Z (cid:48) .Therefore σ V = [ X, V , . . . , V m , Y ] corresponds to asequence of unblocked treks in M between nodes in { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) . Step 3: map σ V to sequence σ Z of unblocked treks be-tween distinct nodes in { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) in M . It is possible that there are duplicates in σ V , (i.e. V i = V j with i (cid:54) = j ), e.g. in case a descendantin AA ( { X, Y } ) ∪ Z (cid:48) is shared by multiple U i . Inhat case we can remove all nodes [ V i +1 , . . . , V j ] fromthe sequence σ V while still keeping a contiguous se-quence of unblocked treks between X and Y . As-sume we repeatedly merge such doublets (removingall intermediate nodes) until we are left with a se-quence σ Z = [ X, Z , . . . , Z k , Y ] of distinct nodes Z i ∈ AA ( { X, Y } ) ∪ Z (cid:48) , with Z i (cid:54) = Z j for i (cid:54) = j , and whereeach ( Z i , Z i +1 ) is connected by a trek (with arrowsinto Z i and Z i +1 ) in M that is unblocked given anysubset { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) .Note that the mapping σ V → σ Z is not necessarilyunique, e.g. if σ V = [1 , , , , , , , , , σ Z = [1 , , , , ,

7] or σ Z = [1 , , , ,

7] will do.

Step 4: match sequence σ Z to path π (cid:48) in M (cid:48) . As each pair ( Z i , Z i +1 ) in the sequence σ Z is connectedby a trek in M that does not contain any noncollid-ers that are in { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) , it followsthat there is an unblocked path between each suchpair given any subset of { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) ,and so each pair ( Z i , Z i +1 ) must be adjacent in themarginal MAG M (cid:48) over { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) .That means the sequence of nodes σ Z corresponds toa path π (cid:48) = (cid:104) X, Z , . . . , Z k , Y (cid:105) in M (cid:48) . Note that thisspeciﬁc path π (cid:48) is not necessarily the unblocked pathin M (cid:48) relative to AA ( { X, Y } ) ∪ Z (cid:48) we are looking for(we construct this next). For example, in Figure 2, thesequence σ Z = [ X, Z , Z i , Z k , Z j , Y ] does indeed cor-respond to a path X ↔ Z ↔ Z i → Z k ↔ Z j ↔ Y in M (cid:48) , but this path is not unblocked relative to all othernodes in M (cid:48) except { X, Y } , as Z i is a noncollider along π (cid:48) . However, this induces a bidirected edge X ↔ Z k in M (cid:48) , so that the path π ∗ = X ↔ Z k ↔ Z j ↔ Y over a subset of nodes along π (cid:48) does correspond to anunblocked path in M (cid:48) relative to all other nodes. Step 5: ﬁnd bidirected edges that span nodes along π (cid:48) . Even though the trek in M between each ( Z i , Z i +1 ) is into both nodes, they are not necessarily connected bya bidirected edge along π (cid:48) in M (cid:48) , as one may be an-cestor of the other in M (and so also in M (cid:48) ). Here weshow that this induces bidirected edges in the marginalMAG M (cid:48) between other nodes along π (cid:48) , such that abidirected edge path π ∗ over a subset of nodes along π (cid:48) connecting X and Y in M (cid:48) remains. (Note that nodeson π (cid:48) are never connected by an undirected edge, asthey all have arrowhead marks in M , being (descen-dants of) colliders along the original π .)We now identify (maximal) inducing groups of succes-sive nodes [ Z i , .., Z j ] along π (cid:48) that are all ancestor (in M ) of the ﬁrst or the last node (‘sink’) in the samegroup, and where two successive inducing groups along π (cid:48) overlap on the last, resp. ﬁrst sink node, so that[ X, .., Z i ] , [ Z i , .., Z j ] , .., [ Z k , .., Z m ] , [ Z m , .., Y ] along π .Each group [ Z p , .., Z r ] is constructed as follows: start-ing from the last sink node of the previous group Z p X U U U k W Z V YQ RV m T !! X U U k YW ! V V V m Inducing groups along path blocked by Z (Lemma 19)

X YZ Z i Z k Z j Z X YZ Z i Z k Z j (a) (b) Figure 2: (a) Path in M blocked by D -sep node Z , (b)Idem in marginal MAG M (cid:48) (without Z ) with correspond-ing inducing groups [ X, Z , Z i , Z k ] , [ Z k , Z j ] , [ Z j , Y ] and in-duced edge X ←→ Z k . (or X for the ﬁrst group), add successive nodes along π (cid:48) to the group until the ﬁrst node Z q is encounteredthat is not in An ( Z p ). Then, starting from Y backalong π (cid:48) , ﬁnd the ﬁrst node Z r such that all nodesfrom Z q up to Z r along π (cid:48) are in An ( Z r ) in M (pos-sibly Z r = Z q ). Then [ Z p , .., Z r ] is the next maximalinducing group.The two sink nodes of a maximal inducing group areconnected by an inducing path in M (hence the name,see also §

1) with respect to nodes not in { X, Y } ∪ AA ( { X, Y } ) ∪ Z (cid:48) :1. by construction all nodes { Z p , .., Z r } in the groupare in An ( { Z p , Z r } ),2. Z p and Z r are connected by a path in M on whichall colliders are in An ( { Z p , Z r } ).The inducing path between Z p and Z r can be con-structed as follows. First, one concatenates allthe treks Z p ← ∗ . . . ∗ → Z p +1 , Z p +1 ← ∗ . . . ∗ → Z p +2 , . . . , Z r − ←∗ . . . ∗→ Z r . In case a node Q oc-curs more than once, one takes the shortcut by delet-ing all intermediate nodes between the left-most andright-most occurrence of Q . Such Q can occur atmost once on the original path π , and the other oc-currence(s) must be on a directed path to one of thenodes Z p , . . . , Z r . Therefore, if the remaining node Q becomes a collider when making the shortcut, then Q ∈ An ( { Z p , . . . , Z r } ) = An ( { Z p , Z r } ).As a result, by Theorem 4.2-(i) in Richardson andSpirtes (2002) the two sink nodes of a maximal in-ducing group are connected by an edge in M (cid:48) . Fur-thermore, this edge must be a bidirected edge, as sinknodes in a maximal inducing group cannot be ancestorof each other:1. Z r / ∈ An ( Z p ), otherwise by transitivity also Z q ∈ An ( Z p ), contrary the given,2. Z p / ∈ An ( Z r ), because for Z p = X (ﬁrst group) aspart of D -sep link ( X, Y ), the given AA ( { X, Y } ) ∪ Z ⊂ Ant ( { X, Y } ) and acyclicity implies X / ∈ An ( Z r ) ⊂ An ( Z ); and if Z p is the sink nodef the previous group [ Z m , .., Z p ], then all nodes[ Z p , .., Z r ] would also satisfy the conditions for in-clusion in that group, and we would have obtained[ Z m , .., Z r ], contrary the given. Step 6: obtain bidirected edge path π ∗ in M (cid:48) .Therefore in M (cid:48) there is a path π ∗ of bidirected edgesconnecting X to Y via the sink nodes of neighbour-ing inducing groups along π (cid:48) . The connection betweenthe path π in M and the paths π (cid:48) and π ∗ in M (cid:48) isillustrated in Figure 3.For every π (cid:48) there are at least three distinct inducinggroups: the ﬁrst and last edge along π (cid:48) in M (cid:48) (corre-sponding to X ↔ U resp. U m ↔ Y along π in M , asboth U and U m as subset of AA ( { X, Y } ) are also in M (cid:48) ) are part of disjoint groups, as by Lemma 12-(3) U / ∈ An ( U m ), and vice versa. That means there mustbe at least one other group to bridge the gap between U and U m from which follows that there are at least three such bidirected edges on the path π ∗ connecting X and Y in M (cid:48) . This proves the Lemma. Inducing groups along path blocked by Z (Lemma 19)

X YZ Z i Z k Z j Z X YZ Z i Z k Z j (a) (b)(a) (b) X YZ Z Z Z ZW X YZ Z Z Z Figure 3:

Induced bidirected edge path.(a) Path π = (cid:104) X, Z , W, Z , Z, Z , Z , Y (cid:105) in M blockedby D -sep node Z , (b) Corresponding path π (cid:48) = (cid:104) X, Z , Z , Z , Z , Y (cid:105) in marginal MAG M (cid:48) over { X, Y } ∪ AA ( { X, Y } )(= { Z , Z , Z , Z } ), with maximal inducinggroups [ X, Z , Z ] , [ Z , Z , Z ] , [ Z , Y ], leading to inducededges X ↔ Z and Z ↔ Z (red), to give the bidirectededge path π ∗ = (cid:104) X, Z , Z , Y (cid:105) in M (cid:48) (bold). With this we can prove Lemma 6 and Lemma 18 on be-ing able to ﬁnd all required D -sep nodes sequentially aspart of a separating set between nodes already found.All D -sep nodes for a pair ( X, Y ) also appear in an-other minimal conditional independence:

Lemma 6.

In a MAG M , if Z ∈ Z is a D -sepnode in X ⊥⊥ Y | [ Z ], then Z is also part of a mini-mal separating set between another pair of nodes from { X, Y } ∪ Z \ Z ∪ AA ( { X, Y } ), neither of which are partof an undirected edge in M . Proof.

By Lemma 17, in the marginal MAG M (cid:48) over { X, Y } ∪ Z \ Z ∪ AA ( { X, Y } ) there is a sequence of atleast 3 bidirected edges connecting X and Y . If thissequence of edges still exists in the MAG M ∗ over { X, Y } ∪ Z ∪ AA ( { X, Y } ), then X and Y are notseparated given Z ∪ AA ( { X, Y } ), contrary the given. Note that any adjacency in M ∗ that is also presentin M (cid:48) must have identical arrow/tail-marks in M ∗ and M (cid:48) , as both edges must express the same ante-rior relations in M . Therefore at least one of theseedges is eliminated in M ∗ , which implies the exis-tence of a set that can separate these two nodes from { X, Y }∪ Z \ Z ∪ AA ( { X, Y } ). D -sep node Z is necessar-ily part of that set, otherwise the edge would alreadybe eliminated in M (cid:48) . Given that both separated nodeshave arrowhead marks in M (cid:48) (as part of a bidirectededge path) it also follows that neither can be part ofundirected edge as well. Lemma 18.

In a MAG M with D -sep link ( X, Y ), all D -sep nodes Z DS in X ⊥⊥ Y | AA ( { X, Y } ) ∪ [ Z DS ] canbe found sequentially as part of a minimal separatingset between a pair of nodes already found, startingfrom { X, Y } ∪ AA ( { X, Y } ). Proof.

Starting from the marginal MAG M over { X, Y } ∪ AA ( { X, Y } ) we infer from Lemma 17 thatthere is a sequence of bidirected edges in M connect-ing X and Y . Analogous to the rationale in Lemma6: if this sequence of edges still exists in M ∗ over { X, Y } ∪ AA ( { X, Y } ) ∪ Z DS , then X and Y are notseparated, contradicting the assumptions. Thereforeat least one of these edges must be eliminated in M ∗ ,which implies the existence of a minimal separating setcontaining one or more of the nodes from Z DS . If onlya subset Z ⊂ Z of nodes are needed in this separatingset, then we can apply the same argument again to themarginal MAG M over { X, Y } ∪ AA ( { X, Y } ) ∪ Z .Each time we ﬁnd new nodes from Z that are part of aseparating set between some pair of nodes we alreadyfound, until all have been added. Lemma 18 describes a procedure to ﬁnd all required D -sep nodes for a given D -sep link ( X, Y ), starting fromthe adjacent ancestors of X and Y . However, there is apossible snag in the sense that the lemma ensures thateach D -sep node can be found as part of some minimalseparating set between a pair of nodes already found,but standard search strategies like PC only look fora single minimal sepset between each separable pairof nodes. As a result, if there are multiple possibleminimal sepsets it is not a priori guaranteed that the D -sep nodes we are looking for are indeed contained inthe sepset returned by the PC-search stage. However,it turns out that they can still be found as part ofan ‘ancestral superset’ that is built recursively fromresults already obtained.For that we introduce the following recursive deﬁnitionfor a set of separating nodes: eﬁnition 6. Let I be a minimal independence set,then for a set X the hierarchy HIE ( X , I ) is theunion of X and all nodes that appear in a mini-mal separating set in I between any pair of nodes in HIE ( X , I ).The recursion as formula: let Q = X , and Q i +1 = Q i ∪ (cid:16)(cid:83) j W j : U j , V j ∈ Q i , U j ⊥⊥ V j | [ W j ] ∈ I (cid:17) .Then, if n is the lowest index for which Q n +1 = Q n , HIE ( X , I ) = Q n , with n < nr. of variables in I .Note that X ⊆ HIE ( X , I ) ⊆ Ant ( X ) for any X andany minimal independence set I ( M ).The crucial result is now that for an arbitrary minimalindependence set the hierarchy of a D -sep link and itsadjacent ancestors is guaranteed to be a separatingsuperset that contains all required D -sep nodes. Lemma 7.

In a MAG M with minimal indepen-dence model I ( M ), suppose that X and Y are non-adjacent. Let I ( S + ) ⊂ I ( M ) be a subset that con-tains the same separating sets between ancestors of X and/or Y in M , except that it does not containa separating set for { X, Y } . If Q = HIE ( { X, Y } ∪ AA ( { X, Y } ) , I ( S + )) \{ X,Y } , then Q is a separating setbetween X and Y , i.e. X ⊥⊥ Y | Q . Proof. As X and Y are non-adjacent in M , there is aminimal separating set X ⊥⊥ Y | [ Z ]. By Lemma 11 andCorollary 9, X ⊥⊥ Y | [ Z ] implies that X ⊥⊥ Y | Q ∪ [ Z ∗ ],with Q ⊃ AA ( { X, Y } ) as deﬁned above and where all Z ∗ ⊂ Z are D -sep nodes for ( X, Y ).If (

X, Y ) is not a D -sep link then Z ∗ = ∅ , and theprevious immediately reduces to X ⊥⊥ Y | Q .Suppose Z ∗ (cid:54) = ∅ , i.e. Q is not a D -separating set for( X, Y ). Then we can write X ⊥⊥ Y | Q ∪ Z ∗\ Z ∪ [ Z ], forsome Z ∈ Z ∗ , and so there is an unblocked path π in M relative to Q ∪ Z ∗\ Z .By Lemma 17 any path π unblocked without Z corre-sponds to a sequence of three or more bidirected edgesin any marginal MAG over { X, Y } ∪ AA ( { X, Y } ) ∪ Q (cid:48) with Q (cid:48) ⊆ ( Q ∪ Z ∗\ Z ), including the marginal MAG M (cid:48) over { X, Y } ∪ Q .Therefore there is an unblocked path π (cid:48) between X and Y relative to Q in this M (cid:48) that consists of a sequence ofthree or more bidirected edges. If these edges are stillpresent in M , then the path is still unblocked relativeto any set that includes Q , counter to X ⊥⊥ Y | Q ∪ [ Z ∗ ].If one of the edges from π (cid:48) is no longer present in M ,then that implies the existence of a minimal separat-ing set in I ( M ) between two nodes from { X, Y } ∪ Q .Any such separating set would satisfy the conditionsfor inclusion in the hierarchy Q (deﬁnition 6), and soalready be part of Q . But then the corresponding edge should be absent from the marginal MAG M (cid:48) over { X, Y } ∪ Q , contrary the assumption that it waspart of the path π (cid:48) .Therefore the assumption that Z ∗ (cid:54) = ∅ leads to acontradiction, and so Q must already contain all re-quired nodes to form a separating set for ( X, Y ), i.e. X ⊥⊥ Y | Q . SU T YX VZ Z Z U U V YX V Z Z U U V YX V Z Z U U V YX V Z Z U (a) (b) Figure 4:

Illustration that it is not suﬃcient to look at sep-arating sets between nodes adjacent to { X, Y } , but that itis necessary to include the full recursive hierarchy: nodes X and Y are D -separated by { S, T, U, V, Z , Z , Z } , but Z (blocking the path in bold) is only present (and necessary)in S ⊥⊥ Z | [ Z , Z ], with Z / ∈ Adj ( { X, Y } ). The resulting D -sep set can be converted into a min-imal D -separating set in at most N additional inde-pendence tests, by removing redundant nodes one-by-one until no more can be found, see (Tian et al., 1998).While searching for D -sep links in the augmentedskeleton S + we may not yet know the true adjacentancestors of a D -sep candidate pair { X, Y } in the un-derlying MAG M . For a node with degree boundedby k , it has to be a combination of at most k from N nodes, which for a pair implies one from worst caseorder N k × N k = N k diﬀerent sets. If there is a D -separating set for candidate { X, Y } , then it is guaran-teed to appear in the hierarchy implied by one of thesesets. Therefore, in a sparse graph, we need at most apolynomial number of tests (in N ) to ﬁnd a D -sep set.We can now prove the main claim of the article: Theorem 1.

Let M be a MAG over N observed nodescorresponding to a distribution that is faithful to someunderlying causal DAG G , such that the node degreein M is bounded by some constant k , then the soundand complete equivalence class PAG P can be obtainedrom worst case polynomial order N k +2) indepen-dence tests, even when latent variables and selectionbias may be present. Proof.

Follows from a combination of (known) com-plexity results for the three main stages required toobtain the PAG P :1. ﬁnd PC-skeleton graph S from adjacency search,2. eliminate D -sep links to obtain the skeleton of M ,3. orient invariant edge marks to obtain PAG P .The ﬁrst stage is known to require worst case order N k +2 independence tests, as it searches for subsets ≤ k from N − N ( N −

1) edges (Spirtes et al., 2000).This article has shown that the second stage can becompleted in a number of independence tests that isalso worst case polynomial order in N . First it suﬃcesto ﬁnd/update the augmented skeleton S + (Deﬁnition4) in order to obtain up to N possible candidate D -seplinks (Lemma 4). Augmenting the PC-graph may takeorder N tests as it needs to check for at most N − N ( N −

1) edges eliminated so far. Lemma 16ensures that as long as not all D -sep links have beenfound then at least one of these has no unidentiﬁed D -sep links between its ancestors and so can be identiﬁedas a bidirected edge triple in S + (Lemma 4). For eachpossible D-sep link { X, Y } we need to ﬁnd the set ofpossible adjacent ancestors AA ( { X, Y } ) (Deﬁnition 3)in M to compute the corresponding hierarchy (Lemma7). For both candidates { X, Y } this implies searchingfor at most k nodes from N − N k × N k = N k inde-pendence tests. On ﬁnding a D -sep set we may needorder N tests to convert it into a minimal separatingset (Tian et al., 1998), update the augmented skele-ton ( N ), and possibly recheck up to N previouslytried-but-failed D -sep links. This leads to an over-all complexity of the second stage of worst case order N × N k × N = N k +2) , where augmenting thegraph and conversion into minimal D -sep sets do notcontribute to the leading order terms.The third stage does not require additional indepen-dence tests at all. Complexity of the orientation ruleslies mainly in checking for each edge mark for existenceof certain paths in P , which can be done in order N k by a generic ‘reachability’ algorithm, with N k ∼ nr.of edges. As there are N possible edge marks, thisoverall complexity is worst case order N k .As the ﬁrst and third stage do not contribute to theleading order terms from the second stage, it impliesthat the overall complexity of ﬁnding the sound andcomplete PAG takes at most order N k +2) indepen-dence tests. Note that in practice the typical performance is muchbetter than this worst case result suggests: relativelyfew D -sep candidates with even fewer D -sep links.This limits the nr. of adjacent nodes per candidatepair in the second stage to ∼ k , which reduces themost expensive term from N k down to constant or-der 2 k (twice all subsets from k nodes), independentof N .A description of the FCI+ algorithm that implementsthis result can be found in the main article. Acknowledgements

This research was supported by the NWO (Nether-lands Organization for Scientic Research), grant nr.612.001.202. JM was supported by NWO VENI grantnr. 639.031.036.

References

T. Claassen and T. Heskes. A logical characterizationof constraint-based causal discovery. In

Proc. of the27th Conference on Uncertainty in Artiﬁcial Intel-ligence (UAI) , pages 135–144, 2011.T. Richardson and P. Spirtes. Ancestral graph Markovmodels.

Ann. Stat. , 30(4):962–1030, 2002.P. Spirtes, C. Meek, and T. Richardson. An algorithmfor causal inference in the presence of latent vari-ables and selection bias. In

Computation, Causa-tion, and Discovery , pages 211–252. AAAI Press,Menlo Park, CA, 1999.P. Spirtes, C. Glymour, and R. Scheines.