RREADING DEPENDENCIES FROM COVARIANCEGRAPHS
By Jose M. Pe˜na
ADIT, Department of Computer and Information ScienceLink¨oping University, SE-58183 Link¨oping, SwedenE-mail: [email protected]
The covariance graph (aka bi-directed graph) of a probability dis-tribution p is the undirected graph G where two nodes are adjacentiff their corresponding random variables are marginally dependent in p . ∗ In this paper, we present a graphical criterion for reading depen-dencies from G , under the assumption that p satisfies the graphoidproperties as well as weak transitivity and composition. We provethat the graphical criterion is sound and complete in certain sense.We argue that our assumptions are not too restrictive. For instance,all the regular Gaussian probability distributions satisfy them.
1. Introduction.
The covariance graph (aka bi-directed graph) of aprobability distribution p is the undirected graph G where two nodes areadjacent iff their corresponding random variables are marginally dependentin p . Covariance graphs were introduced in (Cox and Wermuth, 1993) torepresent independence models. Since then, they have received considerableattention. See, for instance, (Banerjee and Richardson, 2003; Chaudhuriet al., 2007; Cox and Wermuth, 1996; Drton and Richardson, 2003, 2008;Kauermann, 1996; Lupparelli et al., 2009; Malouche and Rajaratnam, 2009;Pearl and Wermuth, 1994; Richardson, 2003; Wermuth, 1995; Wermuth andCox, 1998; Wermuth et al., 2006; Wermuth, 2011, 2012). The works (Baner-jee and Richardson, 2003; Kauermann, 1996) are particularly important forthe interpretation of covariance graphs in terms of independencies. Specifi-cally, these works introduce a graphical criterion for reading independenciesfrom the covariance graph G of a probability distribution p , under the as-sumption that p satisfies the graphoid properties and composition. In thispaper, we show that G can also be used to read dependencies holding in p . Keywords: chain graphs, concentration graphs, covariance graphs ∗ It is worth mentioning that our definition of covariance graph is somewhat non-standard. The standard definition states that the lack of an edge between two nodesof G implies that their corresponding random variables are marginally independent in p .This difference in the definition is important in this paper. a r X i v : . [ s t a t . M L ] J un J. M. PE ˜NA
Specifically, we present a graphical criterion for reading dependencies from G under the assumption that p satisfies the graphoid properties, weak tran-sitivity and composition. We also prove that our graphical criterion is soundand complete. Here, complete means that it is able to read all the depen-dencies in p that can be derived by applying the graphoid properties, weaktransitivity and composition to the dependencies used in the constructionof G and the independencies obtained from G . We also show that there ex-ist important families of probability distributions that satisfy the graphoidproperties, weak transitivity and composition. These include, for instance,all the regular Gaussian probability distributions.Note that this paper would be unnecessary if p satifies all and only theindependencies that can be read from G via the graphical criterion in (Baner-jee and Richardson, 2003; Kauermann, 1996), i.e. p is faithful to G . We willsee that one cannot safely assume faithfulness in general. Therefore, one isonly entitled to assume that p satifies all (but not necessarily only) the inde-pendencies that can be read from G via the graphical criterion in (Banerjeeand Richardson, 2003; Kauermann, 1996), i.e. p is Markov wrt G . This isactually the reason of being of this paper.Two previous works that somehow address the problem of reading de-pendencies off covariance graphs are (Wermuth, 1995; Wermuth and Cox,1998). These works propose to determine whether two random variables U A and U B are dependent given some other random variables U Z by, first,constructing the covariance graph of the conditional probability distribu-tion given U Z of any set of random variables that includes U A and U B and,then, checking if the nodes corresponding to U A and U B are adjacent inthe covariance graph constructed. Therefore, these works construct multi-ple covariance graphs, one for each conditional probability distribution ofinterest, from which only the dependencies used in their construction areread. The work presented in this paper is radically different: We only con-struct the covariance graph of the probability distribution at hand and readfrom it many more dependencies than those used in its construction. Whilethis is the first work where a sound and complete graphical criterion forreading dependencies off covariance graphs is developed, it is worth men-tioning that there already exist sound and complete graphical criteria forreading dependencies off other graphical models. For instance, there existsa sound and complete graphical criterion for reading dependencies off theconcentration graph (aka minimal undirected independence map or Markovnetwork) of a probability distribution that satisfies the graphoid properties(Bouckaert, 1995), or the graphoid properties and weak transitivity (Pe˜naet al., 2009). As a matter of fact, the graphical criterion that we present EADING DEPENDENCIES FROM COVARIANCE GRAPHS in this paper is dual to the one in (Pe˜na et al., 2009). There also existsa sound and complete graphical criterion for reading dependencies off theBayesian network (aka minimal directed independence map) of a proba-bility distribution that satisfies the graphoid properties (Bouckaert, 1995),the graphoid properties and weak transitivity (Pe˜na, 2010), or the graphoidproperties and weak transitivity and composition (Pe˜na, 2007). In the lasttwo references, the Bayesian networks are restricted to be polytrees. Notethat (Bouckaert, 1995; Pe˜na, 2007, 2010; Pe˜na et al., 2009) address a relatedbut not more general problem than the one in this paper, since neither con-centration graphs nor Bayesian networks include covariance graphs. Relatedmore general problems than the one studied in this paper have been recentlyaddressed, though. For instance, a method to read dependencies from mul-tivariate regression graphs, which include covariance graphs, is proposed in(Wermuth, 2012). The author also presents necessary and sufficient condi-tions for the method to be sound. These conditions are the same as the onesconsidered in this paper, namely the graphoid properties plus weak tran-sitivity and composition. Unlike in this paper, no proof of completeness ofthe method proposed appears in (Wermuth, 2012). Another related moregeneral work is (Wermuth, 2011), where the author shows how summarygraphs, which include covariance graphs, can help to detect which depen-dencies remain undistorted and which do not after marginalization and/orconditioning in a probability distribution generated over a so-called par-ent graph. It should be pointed out that that the probability distributionis generated over a parent graph implies that it satisfies the same condi-tions as the ones considered in this paper (Wermuth, 2011, Proposition 3).Again, unlike in this paper, the completeness question is not addressed in(Wermuth, 2011). Finally, it should be noted that (Wermuth, 2011, 2012)make use of the graphical criterion presented in (Sadeghi and Lauritzen,2011) for reading independencies from loopless mixed graphs, which includemultivariate regression graphs, summary graphs and parent graphs. Thiscriterion is sound and complete in certain sense, given that the graphoidproperties and composition hold (Sadeghi and Lauritzen, 2011, Theorem 3).These conditions are, in fact, not only sufficient but necessary too (Sadeghiand Lauritzen, 2011, Section 6.3).We think that the work presented in this paper can be of great interestfor the artificial intelligence community. Graphs are one of the most com-monly used metaphors for representing knowledge because they appeal tohuman intuition (Pearl, 1988). Furthermore, graphs are parsimonious mod-els because they trade off accuracy for simplicity. Consider, for instance,representing the independence model induced by a probability distribution J. M. PE ˜NA as a graph. Though this graph is typically less accurate than the probabil-ity distribution (the graph may not represent all the (in)dependencies andthose that are represented are not quantified), it also requires less space tobe stored and less time to be communicated than the probability distribu-tion, which may be desirable features in some applications. Thus, it seemssensible developing tools for reasoning with graphs. Our graphical criterionis one such a tool: As the graphical criterion in (Banerjee and Richardson,2003; Kauermann, 1996) makes the discovery of independencies amenable tohuman reasoning by enabling to read independencies off a covariance graph G without numerical calculation, so does our graphical criterion with re-spect to the discovery of dependencies. There are fields where discoveringdependencies is more important than discovering independencies (Wermuth,1995; Wermuth and Cox, 1998). It is in these fields where we believe thatour graphical criterion has greater potential. In bioinformatics, for instance,the nodes of G may represent (the expression levels of) some genes understudy. Bioinformaticians are typically more interested in discovering genedependencies than independencies, because the former provide contexts inwhich the expression level of some genes is informative about that of someother genes, which may lead to hypothesize dependencies, functional rela-tions, causal relations, the effects of manipulation experiments, etc. See, forinstance, (Butte and Kohane, 2000) for an application of covariance graphsto bioinformatics under the name of relevance networks.The rest of the paper is organized as follows. We start by reviewing someconcepts in Section 2. We show in Section 3 that assuming the graphoidproperties, weak transitivity and composition is not too restrictive. We provein Section 4 that the existing graphical criterion for reading independenciesfrom covariance graphs is complete in certain sense. This result, in additionto being important in its own, is important for reading as many dependenciesas possible from covariance graphs. We introduce in Section 5 our graphicalcriterion for reading dependencies from covariance graphs and prove that itis sound and complete in certain sense. Finally, we close with some discussionin Section 6.
2. Preliminaries.
In this section, we introduce some concepts and re-sults that are used later in this paper. We first recall some results fromgraphical models. See, for instance, (Banerjee and Richardson, 2003; Kauer-mann, 1996; Lauritzen, 1996; Studen´y, 2005) for further information. Let V = { , . . . , N } be a finite set of size N . The elements of V are not distin-guished from singletons and the union of the sets I , . . . , I n ⊆ V is written asthe juxtaposition I . . . I n . We assume throughout the paper that the union EADING DEPENDENCIES FROM COVARIANCE GRAPHS of sets precedes the set difference when evaluating an expression. Unlessotherwise stated, all the graphs in this paper are defined over V . If a graph G contains an undirected (respectively directed) edge between two nodes v and v , then we say that v − v (respectively v → v ) is in G . If v → v is in G then v is called a parent of v in G . Let P a G ( I ) denote the set ofparents in G of the nodes in I ⊆ V . A route from a node v to a node v n ,denoted v : v n , in a graph G is a sequence of nodes v , . . . , v n such thatthere exists an edge in G between v i and v i +1 for all 1 ≤ i < n . A path isa route v : v n in which the nodes v , . . . , v n are distinct. A route v : v n iscalled undirected if v i − v i +1 is in G for all 1 ≤ i < n . A node v is an an-cestor of a node v n in G if there is a route v : v n in G such that v i − v i +1 or v i → v i +1 is in G for all 1 ≤ i < n . Let An G ( I ) denote the set of ancestorsin G of the nodes in I ⊆ V . A node v n is a descendant of a node v in G ifthere is a route v : v n in G such that v i − v i +1 or v i → v i +1 is in G for all1 ≤ i < n and v i → v i +1 is in G for some 1 ≤ i < n . A chain graph (CG) isa graph (possibly) containing both undirected and directed edges and suchthat no node is a descendant of itself. An undirected graph (UG) is a CGcontaining only undirected edges. A directed and acyclic graph (DAG) is aCG containing only directed edges. A set of nodes of a CG is connected ifthere exists an undirected route in the CG between every pair of nodes in theset. A connectivity component of a CG is a connected set that is maximalwith respect to set inclusion. The moral graph of a CG G , denoted G m , isthe UG where two nodes are adjacent iff they are adjacent in G or they areboth in P a G ( B i ) for some connectivity component B i of G . The subgraphof a CG G induced by I ⊆ V , denoted G I , is the graph over I where twonodes are connected by a (un)directed edge if that edge is in G . Let X , Y and Z denote three disjoint subsets of V . We say that X is separated from Y given Z in a CG G if every path in ( G An G ( XY Z ) ) m from a node in X to anode in Y has some node in Z . We denote such a separation statement by sep G ( X, Y | Z ).Let U = ( U i ) i ∈ V denote a vector of random variables and U I ( I ⊆ V ) itssubvector ( U i ) i ∈ I . We use upper-case letters to denote random variables andthe same letters in lower-case to denote their states. Unless otherwise stated,all the probability distributions in this paper are defined over U . Let X , Y , Z and W denote four disjoint subsets of V . We represent by X ⊥ p Y | Z that U X is independent of U Y given U Z in a probability distribution p .We represent by X (cid:54)⊥ p Y | Z that X ⊥ p Y | Z does not hold. A probabilitydistribution p is a graphoid if it satisfies the following properties: Symmetry Note that our definition of ancestor follows (Lauritzen, 1996) and differs from othersthat exist in the literature, e.g. (Richardson and Spirtes, 2002).
J. M. PE ˜NA X ⊥ p Y | Z ⇒ Y ⊥ p X | Z , decomposition X ⊥ p Y W | Z ⇒ X ⊥ p Y | Z , weakunion X ⊥ p Y W | Z ⇒ X ⊥ p Y | ZW , contraction X ⊥ p Y | ZW ∧ X ⊥ p W | Z ⇒ X ⊥ p Y W | Z , and intersection X ⊥ p Y | ZW ∧ X ⊥ p W | ZY ⇒ X ⊥ p Y W | Z .We say that a graphoid p is a WTC graphoid if it satisfies the followingtwo additional properties: Weak transitivity X ⊥ p Y | Z ∧ X ⊥ p Y | ZK ⇒ X ⊥ p K | Z ∨ K ⊥ p Y | Z with K ∈ V \ XY Z , and composition X ⊥ p Y | Z ∧ X ⊥ p W | Z ⇒ X ⊥ p Y W | Z .Let X , Y and Z denote three disjoint subsets of V . We denote by X ⊥ G Y | Z that a CG G represents that U X is independent of U Y given U Z . We denoteby X (cid:54)⊥ G Y | Z that X ⊥ G Y | Z does not hold. In this paper, we are inter-ested in the classic Lauritzen-Wermuth-Frydenberg interpretation of CGs asindependence models, which is based on the following graphical criterion. Definition . Given a CG G , X ⊥ G Y | Z if sep G ( X, Y | Z ) . However, in this paper we are also interested in the dual interpretation ofUGs as independence models that builds on the following graphical criterion.
Definition . Given an UG G , X ⊥ G Y | Z if sep G ( X, Y | V \ XY Z ) . The following rephrasing of the graphical criterion in Definition 2.2 maybe easier to recall: X ⊥ G Y | Z if every path in G from a node in X to a nodein Y has some node outside XY Z . When an UG is interpreted according tothe graphical criterion in Definition 2.1 we call it a concentration graph, andwhen it is interpreted according to the graphical criterion in Definition 2.2 wecall it a covariance graph. A probability distribution p is Markov wrt a CG,concentration graph or covariance graph G when X ⊥ p Y | Z if X ⊥ G Y | Z forall X , Y and Z disjoint subsets of V . A probability distribution p is faithfulto a CG, concentration graph or covariance graph G when X ⊥ p Y | Z iff X ⊥ G Y | Z for all X , Y and Z disjoint subsets of V . The concentrationgraph (aka minimal undirected independence map or Markov network) ofa probability distribution p is the UG G where two nodes A and B areadjacent iff A (cid:54)⊥ p B | V \ AB . The covariance graph (aka bi-directed graph)of a probability distribution p is the UG G where two nodes A and B areadjacent iff A (cid:54)⊥ p B . A WTC graphoid p is Markov wrt both its covariancegraph G and its concentration graph H . However, neither X (cid:54)⊥ G Y | Z nor X (cid:54)⊥ H Y | Z implies X (cid:54)⊥ p Y | Z , unless p is faithful to G or H . This is actuallythe reason of being of this paper.
3. WTC Graphoids.
This paper is devoted to the study of WTCgraphoids. We show in this section that WTC graphoids are worth study-ing because they include important families of probability distributions. For
EADING DEPENDENCIES FROM COVARIANCE GRAPHS instance, any regular Gaussian probability distribution is a WTC graphoid(Studen´y, 2005, Sections 2.2.2, 2.3.5 and 2.3.6). The following theorem in-troduces another interesting family of WTC graphoids. Theorem . Let G be a CG. Any probability distribution p that isfaithful to G is a WTC graphoid. Proof.
Let q be any regular Gaussian probability distribution that isfaithful to G . Such probability distributions exist due to (Pe˜na, 2011, The-orems 1 and 2). Since p and q are faithful to G , X ⊥ p Y | Z iff X ⊥ G Y | Z iff X ⊥ q Y | Z for all X , Y and Z disjoint subsets of V . Therefore, p is a WTCgraphoid because q is a WTC graphoid.The previous theorem is meaningful only if we prove that, for any CG,there exist probability distributions that are faithful to it. We do so in thefollowing theorem. Theorem . Let G be a CG. If each random variable in U has afinite prescribed sample space with at least two possible states, then thereexists a discrete probability distribution with the prescribed sample spacesfor the random variables in U that is faithful to G . On the other hand, if thesample space of each random variable in U is R , then there exist a regularGaussian probability distribution that is faithful to G and a continuous butnon-Gaussian probability distribution that is faithful to G . Proof.
The first and second statements in the theorem are proven in(Pe˜na, 2009, Theorems 3 and 5) and (Pe˜na, 2011, Theorems 1 and 2), re-spectively. The third statement in the theorem can easily be proven by usingcopulas (Nelsen, 2006) as follows. Let p denote any regular Gaussian prob-ability distribution that is faithful to G . Derive the Gaussian copula for p .The copula represents the independence model of p stripped from its uni-variate marginals. Therefore, the copula together with a set of arbitraryunivariate marginals can be used to generate a multivariate probability dis-tribution whose independence model is the one dictated by the copula andwhose univariate marginals are the given ones. The desired result is achievedif the arbitrary marginals are chosen so that they are continuous but non-Gaussian. See (Nelsen, 2006) for more details.It is worth mentioning that the results in (Pe˜na, 2009, Theorems 3 and5) and (Pe˜na, 2011, Theorems 1 and 2) are actually stronger than the firstand second statements in the previous theorem. Specifically, the results re-ported there are that, in certain measure-theoretic sense, almost all the J. M. PE ˜NA discrete probability distributions and regular Gaussian probability distribu-tions that are Markov wrt a CG are faithful to it. Finally, note that themarginals and conditionals of a regular Gaussian probability distributionare regular Gaussian probability distributions and, thus, WTC graphoids.In fact, this property can be generalized to all the WTC graphoids. Thefollowing theorem, originally reported in (Pe˜na et al., 2006, Theorem 5),formalizes this result.
Theorem . Let p be a WTC graphoid and let I ⊆ V . Then, p ( U V \ I ) is a WTC graphoid. If p ( U V \ I | U I = u I ) has the same (in)dependencies forall u I , then p ( U V \ I | U I = u I ) for any u I is a WTC graphoid. It is worth noting that many members of the families of WTC graphoidsthat we have presented in this section are Markov wrt their covariance graphsbut not faithful to them. Hence, the need to develop a graphical criterionfor reading dependencies from the covariance graph of a WTC graphoid.For example, consider any discrete, regular Gaussian, or continuous butnon-Gaussian probability distribution p that is faithful to a CG with { A → B, B → C } as induced subgraph. Then, the covariance graph G of p has { A − B, A − C, B − C } as induced subgraph and, thus, p is not faithful to G since A (cid:54)⊥ G C | Z but A ⊥ p C | Z for some Z ⊆ V . This example is basedon (Drton and Richardson, 2003; Pearl and Wermuth, 1994). The interestedreader is referred to these works for a characterization of the independencemodels that can be represented exactly by DAGs but not by covariancegraphs.
4. Reading Independencies.
The graphical criterion in Definition 2.2is sound for reading independencies from the covariance graph G of a WTCgraphoid p , that is, it only identifies independencies in p (Banerjee andRichardson, 2003; Kauermann, 1996, Proposition 2.2). In this section, weshow that this graphical criterion is complete in the sense that it identifiesall the independencies in p that can be identified by studying G alone. Thiscompleteness result, in addition to being important in its own, is crucial forreading as many dependencies as possible from G , as we will see in the nextsection. In order to prove the referred completeness result, it suffices to provethat there exist WTC graphoids that are faithful to G , because G is theircovariance graph and they only have the independencies that the graphicalcriterion in Definition 2.2 identifies from G . Therefore, we cannot derivemore independencies from G alone than those identified by this graphicalcriterion, because p may be one of the WTC graphoids that are faithful to G . The following two theorems prove the desired result. EADING DEPENDENCIES FROM COVARIANCE GRAPHS Theorem . Let G be a covariance graph. If each random variablein U has a finite prescribed sample space with at least two possible states,then there exists a discrete probability distribution with the prescribed samplespaces for the random variables in U that is faithful to G . On the otherhand, if the sample space of each random variable in U is R , then thereexist a regular Gaussian probability distribution that is faithful to G and acontinuous but non-Gaussian probability distribution that is faithful to G . Proof.
We start by proving the first statement in the theorem. First,we create a DAG from G as follows: Replace each edge A − B in G with A ← V (cid:48) A,B → B where V (cid:48) A,B is a newly created node. Call the resultingDAG H and let V (cid:48) denote all the newly created nodes. It is easy to see that X ⊥ H Y | Z iff X ⊥ G Y | Z for all X , Y and Z disjoint subsets of V .Let U (cid:48) = ( U (cid:48) i ) i ∈ V (cid:48) denote a vector of random variables such that eachof them has any finite sample space with at least two possible states. Let p ( U, U (cid:48) ) denote any discrete probability distribution that is faithful to H .Such probability distributions exist by (Meek, 1995, Theorem 7). Note that,for any X , Y and Z disjoint subsets of V , X ⊥ p ( U ) Y | Z iff X ⊥ p ( U,U (cid:48) ) Y | Z iff X ⊥ H Y | Z iff X ⊥ G Y | Z . Consequently, p ( U ) is faithful to G .The second statement in the theorem can be proven in much the sameway as the first if p ( U, U (cid:48) ) denotes now any regular Gaussian probabilitydistribution that is faithful to H . Such probability distributions exist by(Spirtes et al., 1993, Theorem 3.2). Note that p ( U ) is regular Gaussian.Finally, the third statement in the theorem can be proven by using copulasas we did in the proof of Theorem 3.2.An alternative proof of the second statement in the theorem above followsfrom (Richardson and Spirtes, 2002, Theorem 7.5). Specifically, (Richardsonand Spirtes, 2002) introduces a new class of graphical models called ancestralgraphs, whose edges can be undirected, directed or bi-directed ( ↔ ). Covari-ance graphs are equivalent to ancestral graphs with only bi-directed edges.(Richardson and Spirtes, 2002, Theorem 7.5) proves that, for any ancestralgraph, there is a regular Gaussian probability distribution that is faithful toit. In Appendix A, we strengthen the second statement in the theorem aboveby proving that, in certain measure-theoretic sense, almost all the regularGaussian probability distributions that are Markov wrt a covariance graphare faithful to it. Although this result is not used in this paper, we considerit to be interesting in its own and, thus, we decide to report on it. Theorem . Let G be a covariance graph. Any probability distribution p that is faithful to G is a WTC graphoid. J. M. PE ˜NA
Table 1
CGs in Example 4.1, and covariance graph in Examples 5.1 and 5.2.
Proof.
Let q be any regular Gaussian probability distribution that isfaithful to G . Such probability distributions exist due to Theorem 4.1. Since p and q are faithful to G , X ⊥ p Y | Z iff X ⊥ G Y | Z iff X ⊥ q Y | Z for all X , Y and Z disjoint subsets of V . Therefore, p is a WTC graphoid because q is aWTC graphoid.As explained at the beginning of this section, the previous two theoremsimply that the graphical criterion in Definition 2.2 is complete for readingindependencies from the covariance graph G of a WTC graphoid p , in thesense that it identifies all the independencies in p that can be identified bystudying G alone. An equivalent formulation of this result is that the graph-ical criterion is complete in the sense that it identifies all the independenciesthat are shared by all the WTC graphoids whose covariance graph is G .Finally, it is worth mentioning that the graphical criterion in Definition 2.2is not complete in the more stringent sense of being able to identify all theindependencies in p . Actually, no sound graphical criterion for reading inde-pendencies from G is complete in this latter sense. An example illustratingthis follows. Example . Let p and p (cid:48) denote two WTC graphoids that are faith-ful to the CGs in the left and center of Table 1, respectively. Such WTCgraphoids exist by (Pe˜na, 2011, Theorems 1 and 2). Note that A ⊥ p C | B whereas A (cid:54)⊥ p (cid:48) C | B . Let G and H denote the covariance and concentrationgraphs of p , respectively. Likewise, let G (cid:48) and H (cid:48) denote the covariance andconcentration graphs of p (cid:48) , respectively. Note that G , H , G (cid:48) and H (cid:48) are allthe complete graph over { A, B, C, D } . Now, let us assume that we are deal-ing with p . Then, no sound graphical criterion entails A ⊥ p C | B from G because this independence does not hold in p (cid:48) , and it is impossible to knowwhether we are dealing with p or p (cid:48) on the sole basis of G . EADING DEPENDENCIES FROM COVARIANCE GRAPHS
5. Reading Dependencies.
In this section, we present the main con-tribution of this paper: We introduce a graphical criterion for reading de-pendencies from the covariance graph of a WTC graphoid and prove thatit is sound and complete in certain sense. If G is the covariance graphof a WTC graphoid p then we know, by definition of G , that A (cid:54)⊥ p B for all the edges A − B in G . We call these dependencies the dependencebase of p . Further dependencies in p can be derived from the dependencebase via the WTC graphoid properties. For this purpose, we rephrase theWTC graphoid properties in their contrapositive form as follows. Symme-try Y (cid:54)⊥ p X | Z ⇒ X (cid:54)⊥ p Y | Z . Decomposition X (cid:54)⊥ p Y | Z ⇒ X (cid:54)⊥ p Y W | Z .Weak union X (cid:54)⊥ p Y | ZW ⇒ X (cid:54)⊥ p Y W | Z . Contraction X (cid:54)⊥ p Y W | Z ⇒ X (cid:54)⊥ p Y | ZW ∨ X (cid:54)⊥ p W | Z is problematic for deriving new dependencies be-cause it contains a disjunction in the consequent and, thus, we split it intotwo properties: Contraction1 X (cid:54)⊥ p Y W | Z ∧ X ⊥ p Y | ZW ⇒ X (cid:54)⊥ p W | Z , andcontraction2 X (cid:54)⊥ p Y W | Z ∧ X ⊥ p W | Z ⇒ X (cid:54)⊥ p Y | ZW . Likewise, intersec-tion gives rise to intersection1 X (cid:54)⊥ p Y W | Z ∧ X ⊥ p Y | ZW ⇒ X (cid:54)⊥ p W | ZY ,and intersection2 X (cid:54)⊥ p Y W | Z ∧ X ⊥ p W | ZY ⇒ X (cid:54)⊥ p Y | ZW . Note thatintersection1 and intersection2 are equivalent and, thus, we refer to themsimply as intersection. Similarly, weak transitivity gives rise to weak tran-sitivity1 X (cid:54)⊥ p K | Z ∧ K (cid:54)⊥ p Y | Z ∧ X ⊥ p Y | Z ⇒ X (cid:54)⊥ p Y | ZK , and weaktransitivity2 X (cid:54)⊥ p K | Z ∧ K (cid:54)⊥ p Y | Z ∧ X ⊥ p Y | ZK ⇒ X (cid:54)⊥ p Y | Z . Fi-nally, composition X (cid:54)⊥ p Y W | Z ⇒ X (cid:54)⊥ p Y | Z ∨ X (cid:54)⊥ p W | Z gives rise tocomposition1 X (cid:54)⊥ p Y W | Z ∧ X ⊥ p Y | Z ⇒ X (cid:54)⊥ p W | Z , and composition2 X (cid:54)⊥ p Y W | Z ∧ X ⊥ p W | Z ⇒ X (cid:54)⊥ p Y | Z . Since composition1 and composition2are equivalent, we refer to them simply as composition. The independencein the antecedent of any of the properties above holds if it can be read off G via the graphical criterion in Definition 2.2. This is the best solution wecan hope for because, as discussed in the previous section, this graphicalcriterion is sound and complete for WTC graphoids. Moreover, this solutiondoes not require more information than what it is available, namely G orequivalently the dependence base of p . We define the WTC graphoid closureof the dependence base of p as the set of dependencies that are in the de-pendence base of p plus those that can be derived from it by applying thenine properties above.Let X , Y and Z denote three disjoint subsets of V . We say that X isconnected to Y given Z in an UG G if there exist two nodes A ∈ X and B ∈ Y such that there exists a single path between A and B in G whosenodes are all outside XY Z \ AB . We denote such a connection statement by con G ( X, Y | Z ). We denote by X ∼ G Y | Z that an UG G represents that U X is dependent on U Y given U Z . We can now introduce our graphical criterion J. M. PE ˜NA for reading dependencies from the covariance graph of a WTC graphoid.
Definition . Given the covariance graph G of a WTC graphoid, X ∼ G Y | Z if con G ( X, Y | V \ XY Z ) . The following rephrasing of the graphical criterion in the previous defini-tion may be easier to recall: X ∼ G Y | Z if there exist two nodes A ∈ X and B ∈ Y such that there exists a single path between A and B in G whosenodes are all in ABZ . Interestingly, the graphical criterion in the previousdefinition is dual to the following graphical criterion, which we developed in(Pe˜na et al., 2009) for reading dependencies from the concentration graphof a WTC graphoid.
Definition . Given the concentration graph G of a WTC graphoid, X ∼ G Y | Z if con G ( X, Y | Z ) . We proved in (Pe˜na et al., 2009, Theorems 5 and 6, Example 3) thatthe graphical criterion in Definition 5.2 is sound and complete in certainsense. We prove in the following two theorems that the graphical criterionin Definition 5.1 is also sound and complete in certain sense.
Theorem . Let G be the covariance graph of a WTC graphoid p . If X ∼ G Y | Z , then X (cid:54)⊥ p Y | Z is in the WTC graphoid closure of the dependencebase of p . Proof.
Let X ∼ G Y | Z hold due to a path A : B with A ∈ X and B ∈ Y .We prove the theorem by induction over the length of A : B . We first proveit for length one. Let Z (cid:48) denote the largest subset of Z such that there is apath in G from A to every node in Z (cid:48) and all the nodes in these paths are in AZ (cid:48) . Then, B ⊥ G Z (cid:48) because, otherwise, there would be two paths between A and B whose nodes are all in ABZ , which would contradict that X ∼ G Y | Z holds due to A : B . Thus, B ⊥ p Z (cid:48) . Moreover, A (cid:54)⊥ p B because A and B are adjacent in G . Then, AZ (cid:48) (cid:54)⊥ p B by symmetry and decomposition, whichtogether with B ⊥ p Z (cid:48) imply A (cid:54)⊥ p B | Z (cid:48) by symmetry and contraction2. Notethat if Z (cid:48) = ∅ , then A (cid:54)⊥ p B directly implies A (cid:54)⊥ p B | Z (cid:48) . In any case, A (cid:54)⊥ p B | Z (cid:48) implies A (cid:54)⊥ p B ( Z \ Z (cid:48) ) | Z (cid:48) by decomposition. Now, note that AZ (cid:48) ⊥ G Z \ Z (cid:48) by definition of Z (cid:48) and thus AZ (cid:48) ⊥ p Z \ Z (cid:48) , which implies A ⊥ p Z \ Z (cid:48) | Z (cid:48) by symmetry and weak union, which together with A (cid:54)⊥ p B ( Z \ Z (cid:48) ) | Z (cid:48) imply A (cid:54)⊥ p B | Z by contraction2. Note that if Z \ Z (cid:48) = ∅ , then A (cid:54)⊥ p B | Z (cid:48) directlyimplies A (cid:54)⊥ p B | Z . In any case, A (cid:54)⊥ p B | Z implies X (cid:54)⊥ p Y | Z by symmetryand decomposition. EADING DEPENDENCIES FROM COVARIANCE GRAPHS Assume as induction hypothesis that the theorem holds when the lengthof A : B is smaller than n . We now prove it for length n . Let C be any nodein A : B except A and B . Note that C ∈ Z and thus A ⊥ G B | Z \ C , whichimplies A ⊥ p B | Z \ C . Moreover, note that A ∼ G C | Z \ C holds due to thesubpath of A : B between A and C , which we denote as A : C . To see it,note that A : C is the only path between A and C in G whose nodes are allin AZ , because if there were two such paths then there would be two pathsbetween A and B in G whose nodes are all in ABZ , which would contradictthat X ∼ G Y | Z holds due to A : B . Likewise, C ∼ G B | Z \ C . Moreover, A ∼ G C | Z \ C and C ∼ G B | Z \ C imply respectively A (cid:54)⊥ p C | Z \ C and C (cid:54)⊥ p B | Z \ C by the induction hypothesis, which together with A ⊥ p B | Z \ C imply A (cid:54)⊥ p B | Z by weak transitivity1, which implies X (cid:54)⊥ p Y | Z by symmetryand decomposition.Finally, note that the above derivation of X (cid:54)⊥ p Y | Z only makes use of thedependence base of p and the nine properties introduced at the beginningof this section. Thus, X (cid:54)⊥ p Y | Z is in the WTC graphoid closure of thedependence base of p .Note that we do not make use of the composition property in the proofabove. However, we do use the fact that the graphical criterion in Defini-tion 2.2 is sound. The proof of this fact in (Banerjee and Richardson, 2003;Kauermann, 1996, Proposition 2.2) does make use of the composition prop-erty. Theorem . Let G be the covariance graph of a WTC graphoid p . If X (cid:54)⊥ p Y | Z is in the WTC graphoid closure of the dependence base of p , then X ∼ G Y | Z . Proof.
Let H denote the concentration graph that has the same verticesand edges as G . In other words, G and H are the same UG but with differentinterpretations. Note that X ∼ G Y | Z iff X ∼ H Y | V \ XY Z , which followsfrom the fact that X ∼ G Y | Z iff con G ( X, Y | V \ XY Z ) iff con H ( X, Y | V \ XY Z ) iff X ∼ H Y | V \ XY Z .Clearly, all the dependencies in the dependence base of p are identified bythe graphical criterion in Definition 5.1. Therefore, it only remains to provethat this graphical criterion satisfies the nine properties introduced at thebeginning of this section. We do so below with the help of H . Note that thegraphical criterion in Definition 5.2 applied to H satisfies the nine propertiesintroduced at the beginning of this section (Pe˜na et al., 2009, Theorem 6,Example 3). • Symmetry Y ∼ G X | Z ⇒ X ∼ G Y | Z . J. M. PE ˜NA
Trivial. • Decomposition X ∼ G Y | Z ⇒ X ∼ G Y W | Z . X ∼ G Y | Z implies X ∼ H Y | V \ XY Z by definition, which implies X ∼ H Y W | V \ XY ZW by weak union, which implies X ∼ G Y W | Z bydefinition. • Weak union X ∼ G Y | ZW ⇒ X ∼ G Y W | Z . X ∼ G Y | ZW implies X ∼ H Y | V \ XY ZW by definition, which implies X ∼ H Y W | V \ XY ZW by decomposition, which implies X ∼ G Y W | Z by definition. • Contraction1 X ∼ G Y W | Z ∧ X ⊥ G Y | ZW ⇒ X ∼ G W | Z . X ∼ G Y W | Z and X ⊥ G Y | ZW imply respectively X ∼ H Y W | V \ XY ZW and X ⊥ H Y | V \ XY ZW by definition, which imply X ∼ H W | V \ XZW by contraction2, which implies X ∼ G W | Z by definition. • Contraction2 X ∼ G Y W | Z ∧ X ⊥ G W | Z ⇒ X ∼ G Y | ZW . X ∼ G Y W | Z and X ⊥ G W | Z imply respectively X ∼ H Y W | V \ XY ZW and X ⊥ H W | V \ XZW by definition, which imply X ∼ H Y | V \ XY ZW by contraction1, which implies X ∼ G Y | ZW by definition. • Intersection X ∼ G Y W | Z ∧ X ⊥ G Y | ZW ⇒ X ∼ G W | ZY . X ∼ G Y W | Z and X ⊥ G Y | ZW imply respectively X ∼ H Y W | V \ XY ZW and X ⊥ H Y | V \ XY ZW by definition, which imply X ∼ H W | V \ XY ZW by composition, which implies X ∼ G W | ZY by definition. • Weak transitivity1 X ∼ G K | Z ∧ K ∼ G Y | Z ∧ X ⊥ G Y | Z ⇒ X ∼ G Y | ZK . X ∼ G K | Z , K ∼ G Y | Z and X ⊥ G Y | Z imply respectively X ∼ H K | V \ XZK , K ∼ H Y | V \ Y ZK and X ⊥ H Y | V \ XY Z . Moreover, X ∼ H K | V \ XZK and K ∼ H Y | V \ Y ZK imply respectively X ∼ H Y K | V \ XY ZK and XK ∼ H Y | V \ XY ZK by symmetry and weak union, which togetherwith X ⊥ H Y | V \ XY Z imply respectively X ∼ H K | V \ XY ZK and K ∼ H Y | V \ XY ZK by symmetry and contraction1, which togetherwith X ⊥ H Y | V \ XY Z imply X ∼ G Y | V \ XY ZK by weak transitiv-ity2, which implies X ∼ G Y | ZK by definition. • Weak transitivity2 X ∼ G K | Z ∧ K ∼ G Y | Z ∧ X ⊥ G Y | ZK ⇒ X ∼ G Y | Z .Trivial because the antecedent involves a contradiction. To see it, notethat X ∼ G K | Z , K ∼ G Y | Z and X ⊥ G Y | ZK imply respectively X ∼ H K | V \ XZK , K ∼ H Y | V \ Y ZK and X ⊥ H Y | V \ XY ZK . More-over, X ∼ H K | V \ XZK and K ∼ H Y | V \ Y ZK imply respectively X ∼ H Y K | V \ XY ZK and XK ∼ H Y | V \ XY ZK by symmetry andweak union, which together with X ⊥ H Y | V \ XY ZK imply respec-tively X ∼ H K | V \ XY ZK and K ∼ H Y | V \ XY ZK by symmetryand composition, which together with X ⊥ H Y | V \ XY ZK imply acontradiction as shown in (Pe˜na et al., 2009, Theorem 6).
EADING DEPENDENCIES FROM COVARIANCE GRAPHS • Composition X ∼ G Y W | Z ∧ X ⊥ G Y | Z ⇒ X ∼ G W | Z . X ∼ G Y W | Z and X ⊥ G Y | Z imply respectively X ∼ H Y W | V \ XY ZW and X ⊥ H Y | V \ XY Z by definition, which imply X ∼ H W | V \ XZW by intersection, which implies X ∼ G W | Z by definition.While Theorem 5.1 was somewhat expected because if there is a singlepath between A and B in G whose nodes are all in ABZ then there isno possibility of path cancellation, the combination of Theorems 5.1 and5.2 is rather exciting: We now have a simple graphical criterion to decidewhether a given dependence is or is not in the WTC graphoid closure of thedependence base of p , i.e. we do not need to try to find a derivation of it,which is usually a tedious task.We devote the rest of this section to some observations that follow fromthe previous two theorems. A sensible question to ask is whether the graph-ical criterion in Definition 5.1 is complete in the sense of being able to iden-tify all the dependencies shared by all the WTC graphoids whose covariancegraph is a given UG. The answer is no. An illustrative example follows. Example . Let G denote the UG in Table 1. Consider any WTCgraphoid p whose covariance graph is G . Such WTC graphoids exist byTheorems 4.1 and 4.2. Then, A (cid:54)⊥ p B | C or A (cid:54)⊥ p C | B because otherwise A ⊥ p BC by intersection, which is a contradiction because A ∼ G BC im-plies A (cid:54)⊥ p BC by Theorem 5.1. Assume A (cid:54)⊥ p B | C . Note that B ∼ G D | C implies B (cid:54)⊥ p D | C by Theorem 5.1. Then, A (cid:54)⊥ p B | C and B (cid:54)⊥ p D | C to-gether with A ⊥ p D | C , which follows from A ⊥ G D | C , imply A (cid:54)⊥ p D | BC byweak transitivity1. Likewise, A (cid:54)⊥ p E | BC when assuming A (cid:54)⊥ p C | B . Then, A (cid:54)⊥ p D | BC or A (cid:54)⊥ p E | BC , which imply A (cid:54)⊥ p DE | BC by decomposition.However, A ∼ G DE | BC does not hold. Note that the fact that the graphical criterion Definition 5.1 is not com-plete in the latter sense implies that it is neither complete in the morestringent sense of being able to identify all the dependencies in the WTCgraphoid at hand. Actually, no sound graphical criterion for reading depen-dencies from the covariance graph of a WTC graphoid can be complete inthis more stringent sense. To see it, consider again Example 4.1. Let us as-sume that we are dealing with p (cid:48) . Then, no sound graphical criterion entails A (cid:54)⊥ p (cid:48) C | B from G (cid:48) because this dependence does not hold in p , and it isimpossible to know whether we are dealing with p or p (cid:48) on the sole basis of G (cid:48) . J. M. PE ˜NA
It is worth mentioning that the graphical criteria in Definitions 5.1 and5.2 complement each other, as each of them can read dependencies thanthe other cannot. To see it, consider the WTC graphoid p in Example 4.1.Then, A ∼ G B and thus A (cid:54)⊥ p B by Theorem 5.1. However, this dependencecannot be derived from H because A ∼ H B does not hold. On the otherhand, A ∼ H B | CD and thus A (cid:54)⊥ p B | CD by (Pe˜na et al., 2009, Theorem 5).However, this dependence cannot be derived from G because A ∼ G B | CD does not hold.Again, a sensible question to ask is whether the joint use of the graphicalcriteria in Definitions 5.1 and 5.2 is complete in the sense of being ableto identify all the dependencies shared by all the WTC graphoids whosecovariance and concentration graphs are two given UGs. The answer is no.An illustrative example follows. Example . Let G denote the UG in Table 1. Let H denote the com-plete graph over { A, B, C, D, E, F } . Consider any WTC graphoid p whosecovariance and concentration graphs are G and H , respectively. Such WTCgraphoids exist. To see it, it suffices to take any WTC graphoid that is faith-ful to G , which exists by Theorems 4.1 and 4.2. Recall that we have provenin Example 5.1 that A ⊥ p DE | BC . However, neither A ∼ G DE | BC nor A ∼ H DE | BC holds. Note that the fact that the joint use of the graphical criteria in Definitions5.1 and 5.2 is not complete in the latter sense implies that it is neither com-plete in the more stringent sense of being able to identify all the dependenciesin the WTC graphoid at hand. Actually, no pair of sound graphical criteriafor reading dependencies from the covariance and concentration graphs of aWTC graphoid can be complete in this more stringent sense. To see it, con-sider again Example 4.1. Let us assume that we are dealing with p (cid:48) . Then,no pair of sound graphical criteria entails A (cid:54)⊥ p (cid:48) C | B from G (cid:48) and H (cid:48) becausethis dependence does not hold in p , and it is impossible to know whether weare dealing with p or p (cid:48) on the sole basis of G (cid:48) and H (cid:48) .The following corollary extends to WTC graphoids a result originallyproven in (Malouche and Rajaratnam, 2009, Theorem 3) for Gaussian proba-bility distributions. The extension is straightforward thanks to the graphicalcriterion in Definition 5.1. Corollary . Let G be the covariance graph of a WTC graphoid p .If G is a forest, then p is faithful to G . Proof.
Assume to the contrary that p is not faithful to G . Since G is EADING DEPENDENCIES FROM COVARIANCE GRAPHS the covariance graph of p , the previous assumption is equivalent to assumethat there exist three disjoint subsets of V , here denoted X , Y and Z , suchthat X (cid:54)⊥ G Y | Z but X ⊥ p Y | Z . However, X (cid:54)⊥ G Y | Z implies that there mustexist a path in G between some node A ∈ X and some node B ∈ Y whosenodes are all in ABZ . Furthermore, since G is a forest, that must be theonly such path between A and B in G . However, this implies X ∼ G Y | Z andthus X (cid:54)⊥ p Y | Z by Theorem 5.1, which is a contradiction.Given the covariance graph G of a WTC graphoid p , X (cid:54)⊥ G Y | Z does notimply X (cid:54)⊥ p Y | Z . This is actually the reason of being of this paper. How-ever, if G is a forest, then the previous corollary proves that X (cid:54)⊥ G Y | Z doesimply X (cid:54)⊥ p Y | Z and, moreover, that this way of reading dependencies from G is complete in the strictest sense discussed above. The following corol-lary extends to WTC graphoids a result originally proven in (Malouche andRajaratnam, 2009, Lemma 5) for Gaussian probability distributions. Theextension is straightforward thanks to the graphical criteria in Definitions5.1 and 5.2. Corollary . Let G and H be, respectively, the covariance and con-centration graphs of a WTC graphoid p . Then, G and H have the same con-nected components. Moreover, if a connected component in G (respectively H ) is a tree then the corresponding connected component in H (respectively G ) is the complete graph. Proof.
First, we prove that G and H have the same connected compo-nents. If two nodes A and B belong to the same connected component in G ,then A ∼ G B | Z for some Z ⊆ V \ AB and thus A (cid:54)⊥ p B | Z by Theorem 5.1.However, if A and B belong to different connected components in H , then A ⊥ H B | Z and thus A ⊥ p B | Z , which is a contradiction. On the other hand,if two nodes A and B belong to the same connected component in H , then A ∼ H B | Z for some Z ⊆ V \ AB and thus A (cid:54)⊥ p B | Z by (Pe˜na et al., 2009,Theorem 5). However, if A and B belong to different connected componentsin G , then A ⊥ G B | Z and thus A ⊥ p B | Z , which is a contradiction.Now, take any connected component C in G that is a tree. We prove thatthe corresponding connected component D in H is the complete graph. Iftwo nodes A and B belong to C , then A ∼ G B | V \ AB and thus A (cid:54)⊥ p B | V \ AB by Theorem 5.1, which implies that A and B are adjacent in D .Finally, take any connected component D in H that is a tree. We provethat the corresponding connected component C in G is the complete graph.If two nodes A and B belong to D , then A ∼ H B and thus A (cid:54)⊥ p B by (Pe˜naet al., 2009, Theorem 5), which implies that A and B are adjacent in C . J. M. PE ˜NA
Note that the opposite of the second statement in the previous corollaryis not true. The following example illustrates this.
Example . Let G (respectively H ) be the covariance (respectivelyconcentration) graph of a WTC graphoid p . Assume that G (respectively H )is the complete graph and p is faithful to it. Such WTC graphoids exist byTheorems 4.1 and 4.2 (respectively Theorems 3.1 and 3.2). Then, p has noindependencies. Consequently, H (respectively G ) is the complete graph and p is faithful to it.
6. Discussion.
In this paper, we have provided new insight into co-variance graphs by introducing a graphical criterion for reading dependen-cies from the covariance graph of a WTC graphoid. We have shown thatWTC graphoids are not a too restrictive family of probability distributionsby showing that it includes interesting discrete, Gaussian, and continuousbut non-Gaussian probability distributions. We have proven that the newgraphical criterion is sound and complete in certain sense. In order to provethese properties, we have had to prove first that the graphical criterion in(Banerjee and Richardson, 2003; Kauermann, 1996) for reading independen-cies from the covariance graph of a WTC graphoid is complete in certainsense. We have done so by proving that there are discrete, Gaussian, andcontinuous but non-Gaussian probability distributions that are faithful toany covariance graph. This result is also important because it implies thatthere exist probability distributions that covariance graphs can represent ex-actly but CGs cannot. Therefore, covariance graphs complement CGs. Thefollowing example illustrates this.
Example . Consider the covariance graph G = { A − B, B − C, C − D, D − A } . Consider any CG H that represents the same independenciesas G . Note that H must have some edge between A and B , B and C , C and D , and D and A . However, A and C cannot be adjacent in H because A ⊥ G C . Likewise, B and D cannot be adjacent in H because B ⊥ G D .Then, H = { A → B, B ← C , A → D, D ← C } because otherwise either A ⊥ H C | B or A ⊥ H C | D or A ⊥ H C | BD , whereas A (cid:54)⊥ G C | B and A (cid:54)⊥ G C | D and A (cid:54)⊥ G C | BD . However, such an H implies B (cid:54)⊥ H D whereas B ⊥ G D .Consequently, no CG can represent the same independencies as G . Thisimplies that every probability distribution that is faithful to G (recall thatsuch probability distributions exist by Theorem 3.2) is not faithful to anyCG. It is fair mentioning that we are not the first to note that covariance
EADING DEPENDENCIES FROM COVARIANCE GRAPHS graphs complement other more popular graphical models. For instance, itfollows from (Drton and Richardson, 2003; Pearl and Wermuth, 1994) thatcovariance graphs complement DAGs. Our example above simply extendsthis observation to CGs.Another consequence of the faithfulness result in Theorem 3.2 is thatit proves wrong the misconception that covariance graphs are densely con-nected because their edges represent marginal dependencies. Specifically, thetheorem implies that there are probability distributions that are faithful toany covariance graph, no matter its topology.Interestingly, the graphical criterion developed in this paper is dual tothe one presented in (Pe˜na et al., 2009) for reading dependencies from theconcentration graph of a WTC graphoid. This duality resembles the dualityexisting between the graphical criteria for reading independencies from con-centration and covariance graphs (Banerjee and Richardson, 2003; Kauer-mann, 1996). We have also shown that the new graphical criterion and theone presented in (Pe˜na et al., 2009) complement each other, as there maybe dependencies that only one of them can identify.Finally, we have pointed out some limitations of the graphical criterionintroduced in this paper that suggest future lines of research. For instance, itremains an open question whether it is possible to develop a similar graphicalcriterion that is complete in a stricter sense than the one used in this paper.It also remains an open question whether our faithfulness result in AppendixA for regular Gaussian probability distributions can be extended to discreteprobability distributions with the help of the parameterizations in (Luppar-elli et al., 2009). Another line of action is the application of our graphicalcriterion in bioinformatics. In such an application, the covariance graph hasto be learnt from gene expression data via, for instance, hypothesis tests.The data available for learning is typically scarce due to the high cost asso-ciated with its production. In this scenario, covariance graphs are easier tolearn than Bayesian networks or concentration graphs, which are the graph-ical models commonly used in bioinformatics, because the former involvestesting for marginal (in)dependencies whereas the latter involve testing forconditional (in)dependencies. We do not suggest with this that one shouldquit using Bayesian networks and concentration graphs in bioinformatics.Specifically, Bayesian networks have an important advantage over covariancegraphs, namely that they can provide us with insight into the mechanistic orcausal process underlying the data at hand. What we suggest is that whenthe data at hand is not considered enough to learn a reliable Bayesian net-work, one may be willing to learn a less informative but more reliable modelsuch as a covariance graph, particularly now that we have graphical criteria J. M. PE ˜NA for reading both dependencies and independencies off it.Note that this paper only studies the structure of covariance graphs and,thus, it does not deal with their parameterization and/or parameter estima-tion. The interested reader is referred to (Chaudhuri et al., 2007; Drton andRichardson, 2003) for Gaussian models and (Drton and Richardson, 2008)for discrete models. However, it may be worth warning here that finding themaximum likelihood estimates of the parameters of a covariance graph canbe hard, depending on the parameterization considered. This is particularlytrue for discrete models (Drton and Richardson, 2008). The reason is thata marginal independence can imply complicated parameter constraints insome parameterizations, because a marginal dependence is a property of thejoint probability distribution rather than of the relevant marginal probabil-ity distribution. An early work that showed the latter is (Zentgraf, 1975),where the author gives an example of two-way interactions implying three-way interaction. In summary, learning the structure of a covariance graphmay be simpler than learning the structure of a concentration graph. How-ever, estimating the parameters of the former may be harder than estimatingthe parameters of the latter.
7. Appendix A.
We strengthen the second statement in Theorem 4.1by proving that, in certain measure-theoretic sense, almost all the regularGaussian probability distributions that are Markov wrt a covariance graphare faithful to it. Although this result is not used in this paper, we considerit to be interesting in its own and, thus, we decide to report on it.We start by recalling some results from matrix theory. See, for instance,(Horn and Johnson, 1985) for more information. Let M = ( M i,j ) i,j ∈ V denotea square matrix. Let M I,J with
I, J ⊆ V denote its submatrix ( M i,j ) i ∈ I,j ∈ J .The determinant of M can recursively be computed, for fixed i ∈ V , as det ( M ) = (cid:80) j ∈ V ( − i + j M i,j det ( M \ ( ij ) ), where M \ ( ij ) denotes the matrixproduced by removing the row i and column j from M . Note then that det ( M ) is a real polynomial in the entries of M . If det ( M ) (cid:54) = 0 then theinverse of M can be computed as ( M − ) i,j = ( − i + j det ( M \ ( ji ) ) /det ( M ) forall i, j ∈ V . We say that M is strictly diagonally dominant if abs ( M i,i ) > (cid:80) { j ∈ V : j (cid:54) = i } abs ( M i,j ) for all i ∈ V , where abs () denotes absolute value. Amatrix M is Hermitian if it is equal to the matrix resulting from, first,transposing M and, then, replacing each entry by its complex conjugate.Clearly, a real symmetric matrix is Hermitian. A real symmetric N × N matrix M is positive definite if x T M x > x ∈ R N .We continue by proving some auxiliary results. We assume hereinafterthat the sample space of each random variable in U is R . Let N ( G ) denote EADING DEPENDENCIES FROM COVARIANCE GRAPHS the set of regular Gaussian probability distributions p such that A ⊥ p B forany two nodes A and B that are not adjacent in G . Note that these areexactly the regular Gaussian probability distributions that are Markov wrt G (Banerjee and Richardson, 2003; Kauermann, 1996, Proposition 2.2). Weparameterize each probability distribution p ∈ N ( G ) with its mean vector µ and covariance matrix Σ. Note that the values of some of these parametersare determined by the values of other parameters or by G . Specifically, thefollowing constraints apply:C1. Σ i,j = Σ j,i for all i, j because Σ is symmetric.C2. Σ i,j = 0 for all i, j such that i and j are not adjacent in G . To seeit, recall that if i and j are not adjacent in G then i ⊥ p j and, thus,Σ i,j = 0 (Studen´y, 2005, Corollary 2.3).Hereinafter, the parameters whose values are not determined by the con-straints above are called non-determined (nd) parameters. However, the val-ues the nd parameters can take are further constrained by the fact that thesevalues must correspond to some probability distribution in N ( G ). In otherwords, the values the nd parameters can take are constrained by the factthat Σ must be positive definite. That is why the set of nd parameter valuessatisfying this requirement are hereinafter called the nd parameter spacefor N ( G ). We do not work out the inequalities defining the nd parameterspace because they are irrelevant for our purpose. The number of nd param-eters is what we call the dimension of G , and we denote it as d . Specifically, d = 2 | V | + | G | where | V | and | G | are, respectively, the number of nodes andedges in G : • | V | due to µ . • | V | due to entries in the diagonal of Σ. • | G | due to the entries below the diagonal of Σ that are not identicallyzero. To see this, recall from the constraint C2 that there is one entrybelow the diagonal of Σ that is not identically zero for each undirectededge in G . Lemma . Let G be a covariance graph of dimension d . The nd pa-rameter space for N ( G ) has positive Lebesgue measure wrt R d . Proof.
Since we do not know a closed-form expression of the nd param-eter space for N ( G ), we take an indirect approach to prove the result. Recallthat, by definition, the nd parameter space for N ( G ) is the set of real valuessuch that, after the extension determined by the constraints C1 and C2, Σis positive definite. Therefore, the nd parameters µ can take values indepen-dently of the nd parameters in Σ. However, the nd parameters in Σ cannot J. M. PE ˜NA take values independently one of another because, otherwise, Σ may not bepositive definite. However, if the entries in the diagonal of Σ take values in( | V | − , ∞ ) and the rest of the nd parameters in Σ take values in [ − , N ( G ) described in the paragraphabove has positive volume in R d and, thus, it has positive Lebesgue mea-sure wrt R d . Then, the nd parameter space of N ( G ) has positive Lebesguemeasure wrt R d . Lemma . Let G be a covariance graph. For every i, j ∈ V and K ⊆ V \ ij , there exists a real polynomial S ( i, j, K ) in the nd parameters in theparameterization of the probability distributions in N ( G ) such that, for every p ∈ N ( G ) , i ⊥ p j | K iff S ( i, j, K ) vanishes for the nd parameter values coding p . Proof.
Let Σ denote the covariance matrix of p . Note that i ⊥ p j | K iff ((Σ ijK,ijK ) − ) i,j = 0 (Lauritzen, 1996, Proposition 5.2). Recall that((Σ ijK,ijK ) − ) i,j = ( − α i,j det (Σ iK,jK ) /det (Σ ijK,ijK ) with α i,j ∈ { , } .Moreover, note that det (Σ ijK,ijK ) > i ⊥ p j | K iff det (Σ iK,jK ) = 0. Moreover, as notedin Section 2, det (Σ iK,jK ) is a real polynomial in the entries of Σ. Thus, i ⊥ p j | K iff a real polynomial R ( i, j, K ) in the entries of Σ vanishes. Recallthat each entry of Σ that is not identically zero corresponds to one of thend parameters in the parameterization of the probability distributions in N ( G ). Therefore, R ( i, j, K ) can be expressed as a real polynomial S ( i, j, K )in the nd parameters. Therefore, i ⊥ p j | K iff S ( i, j, K ) vanishes for the ndparameter values coding p .We interpret the polynomial in the previous lemma as a real function on areal Euclidean space that includes the nd parameter space for N ( G ). We saythat the polynomial in the previous lemma is non-trivial if not all the valuesof the nd parameters are solutions to the polynomial. This is equivalent tothe requirement that the polynomial is not identically zero. Lemma . Let G be a covariance graph of dimension d . The subset ofthe nd parameter space for N ( G ) that corresponds to the probability distri-butions in N ( G ) that are not faithful to G has zero Lebesgue measure wrt EADING DEPENDENCIES FROM COVARIANCE GRAPHS R d . Proof.
Recall that N ( G ) contains exactly the regular Gaussian distri-butions that are Markov wrt G . Therefore, for any probability distribution p ∈ N ( G ) not to be faithful to G , p must satisfy some independence thatis not entailed by G . That is, there must exist three disjoint subsets of V ,here denoted as I , J and K , such that I (cid:54)⊥ G J | K but I ⊥ p J | K . However, if I (cid:54)⊥ G J | K then i (cid:54)⊥ G j | K for some i ∈ I and j ∈ J . Furthermore, if I ⊥ p J | K then i ⊥ p j | K by symmetry and decomposition. By Lemma 7.2, there existsa real polynomial S ( i, j, K ) in the nd parameters in the parameterization ofthe probability distributions in N ( G ) such that, for every q ∈ N ( G ), i ⊥ q j | K iff S ( i, j, K ) vanishes for the nd parameter values coding q . Furthermore, S ( i, j, K ) is non-trivial (Kauermann, 1996, Theorem 3.1). Let sol ( i, j, K )denote the set of solutions to the polynomial S ( i, j, K ). Then, sol ( i, j, K )has zero Lebesgue measure wrt R d because it consists of the solutions to anon-trivial real polynomial in real variables (the nd parameters) (Okamoto,1973). Then, sol = (cid:83) { I,J,K ⊆ V disjoint : I (cid:54)⊥ G J | K } (cid:83) { i ∈ I,j ∈ J : i (cid:54)⊥ G j | K } sol ( i, j, K ) haszero Lebesgue measure wrt R d , because the finite union of sets of zeroLebesgue measure has zero Lebesgue measure too. Consequently, the subsetof the nd parameter space for N ( G ) that corresponds to the probabilitydistributions in N ( G ) that are not faithful to G has zero Lebesgue measurewrt R d because it is contained in sol .In summary, it follows from Lemmas 7.1 and 7.3 that, in the measure-theoretic described, almost all the elements of the nd parameter space for N ( G ) correspond to probability distributions in N ( G ) that are faithful to G . Since this correspondence is clearly one-to-one, it follows that almost allthe regular Gaussian distributions in N ( G ) are faithful to G . We think thatthis result can easily be extended to strictly positive discrete probabilitydistributions with the help of the parameterizations proposed in (Lupparelliet al., 2009). We do not elaborate further on this issue in this paper though.A word of caution is due at this point. It may be tempting to infer fromthe measure-theoretic results above that every regular Gaussian probabilitydistribution p one encounters in reality is almost surely faithful to its covari-ance graph G . This may lead one to conclude that our graphical criterionfor reading from G dependencies holding in p is not needed, since X (cid:54)⊥ G Y | Z almost surely implies X (cid:54)⊥ p Y | Z . This argument is valid if p has been drawnfrom N ( G ) at random. However, we believe that p is more likely to havebeen carefully engineered (e.g. by natural evolution in the case of the genenetworks mentioned in Section 1) than to have been drawn at random. Con-sequently, and despite the measure-theoretic results above, we think that J. M. PE ˜NA one cannot safely assume that p is almost surely faithful to G . Hence, theneed of the graphical criterion proposed in this paper. Acknowledgments.
We thank the anonymous reviewers and the editorfor their thorough review of this manuscript. We are particularly grateful tothe second reviewer for suggesting us to use copulas in the proofs of The-orems 3.2 and 4.1. This work is funded by the Center for Industrial Infor-mation Technology (CENIIT) and a so-called career contract at Link¨opingUniversity.
References.
Moulinath Banerjee and Thomas Richardson. On a Dualization of Graphical GaussianModels: A Correction Note.
Scandinavian Journal of Statistics , 30:817-820, 2003.Remco R. Bouckaert.
Bayesian Belief Networks: From Construction to Inference . PhDThesis, University of Utrecht, 1995.Atul J. Butte and Isaac S. Kohane. Mutual Information Relevance Networks: FunctionalGenomic Clustering Using Pairwise Entropy Measurements. In
Pacific Symposium onBiocomputing 5 , 418-429, 2000.Sanjay Chaudhuri, Mathias Drton and Thomas S. Richardson. Estimation of a CovarianceMatrix with Zeros.
Biometrika , 94:199-216, 2007.David R. Cox and Nanny Wermuth. Linear Dependencies Represented by Chain Graphs.
Statistical Science , 8:204-218, 1993.David R. Cox and Nanny Wermuth.
Multivariate Dependencies - Models, Analysis andInterpretation . Chapman & Hall, 1996.Mathias Drton and Thomas S. Richardson. A New Algorithm for Maximum LikelihoodEstimation in Gaussian Graphical Models for Marginal Independence. In
Proceedingsof the Nineteenth Conference on Uncertainy in Artificial Intelligence , 184-191, 2003.Mathias Drton and Thomas S. Richardson. Binary Models for Marginal Independence.
Journal of the Royal Statistical Society Series B , 70:287-309, 2008.Roger A. Horn and Charles R. Johnson.
Matrix Analysis . Cambridge University Press,1985.G¨oran Kauermann. On a Dualization of Graphical Gaussian Models.
Scandinavian Journalof Statistics , 23:106-116, 1996.Steffen L. Lauritzen.
Graphical Models . Oxford University Press, 1996.Monia Lupparelli, Giovanni M. Marchetti and Wicher P. Bergsma. Parameterizations andFitting of Bi-Directed Graph Models to Categorical Data.
Scandinavian Journal ofStatistics , 36:559-576, 2009.Dhafer Malouche and Bala Rajaratnam. Gaussian Covariance Faithful Markov Trees.arXiv:0912.2407v1 [math.PR].Christopher Meek. Strong Completeness and Faithfulness in Bayesian Networks. In
Pro-ceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence , 411-418,1995.Roger B. Nelsen.
An Introduction to Copulas . Springer, 2006.Masashi Okamoto. Distinctness of the Eigenvalues of a Quadratic Form in a MultivariateSample.
The Annals of Statistics , 1:763-765, 1973.Judea Pearl.
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Infer-ence . Morgan Kaufmann, 1988.EADING DEPENDENCIES FROM COVARIANCE GRAPHS Judea Pearl and Nanny Wermuth. When Can Association Graphs Admit a Causal In-terpretation ? In
Models and Data, Artificial Intelligence and Statistics IV , 205-214,1994.Jose M. Pe˜na. Reading Dependencies from Polytree-Like Bayesian Networks. In
Proceed-ings of the Twentythird Conference on Uncertainty in Artificial Intelligence , 303-309,2007.Jose M. Pe˜na. Faithfulness in Chain Graphs: The Discrete Case.
International Journal ofApproximate Reasoning , 50:1306-1313, 2009.Jose M. Pe˜na. Reading Dependencies from Polytree-Like Bayesian Networks Revisited. In
Proceedings of the Fifth European Workshop on Probabilistic Graphical Models , 225-232,2010.Jose M. Pe˜na. Faithfulness in Chain Graphs: The Gaussian Case. In
Proceedings of theFourteenth International Conference on Artificial Intelligence and Statistics , 588-599,2011.Jose M. Pe˜na, Roland Nilsson, Johan Bj¨orkegren and Jesper Tegn´er. Identifying the Rel-evant Nodes Without Learning the Model. In
Proceedings of the Twentysecond Confer-ence on Uncertainy in Artificial Intelligence , 367-374, 2006.Jose M. Pe˜na, Roland Nilsson, Johan Bj¨orkegren and Jesper Tegn´er. An Algorithm forReading Dependencies from the Minimal Undirected Independence Map of a Graphoidthat Satisfies Weak Transitivity.
Journal of Machine Learning Research , 10:1071-1094,2009.Thomas S. Richardson. Markov Properties for Acyclic Directed Mixed Graphs.
The Scan-dinavian Journal of Statistics , 30:145-157, 2003.Thomas S. Richardson and Peter Spirtes. Ancestral Graph Markov Models.
The Annalsof Statistics , 30:962-1030, 2002.Kayvan Sadeghi and Steffen L. Lauritzen. Markov Properties for Loopless Mixed Graphs.arXiv:1109.5909v1 [stat.OT].Peter Spirtes, Clark Glymour and Richard Scheines.
Causation, Prediction, and Search .Springer-Verlag, 1993.Milan Studen´y.
Probabilistic Conditional Independence Structures . Springer, 2005.Nanny Wermuth. On the Interpretation of Chain Graphs. In
Bulletin of the Interna- tionalStatistical Institute, Proceedings 50th Session , 415-429, 1995.Nanny Wermuth and David R. Cox. On Association Models Defined over IndependenceGraphs.
Bernoulli , 4:477-495, 1998.Nanny Wermuth, David R. Cox and Giovanni M. Marchetti. Covariance Chains.
Bernoulli ,12:841-862, 2006.Nanny Wermuth. Probability Distributions with Summary Graph Structure.
Bernoulli ,17:845-879, 2011.Nanny Wermuth. Sequences of Regressions and their Dependences. arXiv:1110.1986v2[stat.ME].R. Zentgraf. A Note on Lancaster’s Definition of Higher-Order Interactions.