Discrete Max-Linear Bayesian Networks
aa r X i v : . [ m a t h . S T ] F e b DISCRETE MAX-LINEAR BAYESIAN NETWORKS
BENJAMIN HOLLERING AND SETH SULLIVANTA bstract . Discrete max-linear Bayesian networks are directed graphical models specified by the same recursive struc-tural equations as max-linear models but with discrete innovations. When all of the random variables in the model arebinary, these models are isomorphic to the conjunctive Bayesian network (CBN) models of Beerenwinkel, Eriksson,and Sturmfels. Many of the techniques used to study CBN models can be extended to discrete max-linear models andsimilar results can be obtained. In particular, we extend the fact that CBN models are toric varieties after linear changeof coordinates to all discrete max-linear models.
1. I ntroduction
Max-linear Bayesian networks are a special class of graphical models introduced in [6] to model extreme events.A max-linear Bayesian network, X = ( X , . . . X n ) is determined by a directed acyclic graph D = ( V , E ), edgeweights c i j ≥
0, and independent positive random variables Z . . . Z n called innovations . The Z i have support (0 , ∞ )and are required to have atom-free distributions. Then the random vector X is required to satisfy the structuralequations X i = _ j ∈ pa( i ) c i j X j ∨ Z i , where ∨ denotes maximum and pa( i ) denotes the parents of vertex i in D . These models are able to more accuratelymodel extreme events spreading throughout a network than standard discrete or Gaussian Bayesian networks [6, 10]and also have interesting ties to tropical geometry [1].Max-linear models also exhibit a remarkable property concerning conditional independence: they are usuallynot faithful to their underlying DAG, meaning that they often satisfy more conditional independence statementsthan those implied by the d-separation criterion [10]. Am´endola, Kl¨uppelberg, Lauritzen, and Tran recently gavea new criterion named ∗ -separation which gives a complete set of conditional independence statements for max-linear models [1]. There has also been work done to establish when the parameters of the model are identifiable[6, 7, 10]. In all of this work, the atom-free property of the innovations is key.In this note, we consider discrete max-linear Bayesian networks so we assume the Z i are all k -state discreterandom variables. This means the Z i are now atomic so much of the above work on max-linear models does notapply; however, these models are closely related to the conjunctive Bayesian network (CBN) models discussed in[2, 3]. CBN models were originally introduced in [2] to model HIV drug resistance and cancer development viamutation in a genome. In CBN models, events accumulate as you move up a poset (or DAG) similar to the wayextreme events spread throughout a max-linear Bayesian network. Part of our goal in this note is to show that CBNmodels are naturally included in the family of max-linear models. Though we do not develop the details here, wealso believe it would be interesting to find a relationship between the typical max-linear model and our discreteanalogue similar to the relationship found between the CBN model and its continuous analogue as the authors didin [4].In Section 2 we give a brief overview of the combinatorial objects we use throughout the note and some back-ground on the CBN model. In Section 3 we define the discrete max-linear Bayesian network model and showthat when all of the random variables involved are binary, it is equal to the CBN model after a natural change ofcoordinates. In Section 4 we describe the algebraic structure of discrete max-linear model.2. P reliminaries In this section we provide some basic background on graphs and posets which we will use throughout this note.We also discuss Conjunctive Bayesian Network (CBN) models which were first introduced in [2]. These modelsare directly related to the discrete max-linear models we discuss in this note.
Graphs, Posets, and Lattices.
Our notation for graphs follows that of [1] while our notation for posets andlattices follows that of [11]. The graphs in this note will primarily be directed graphs which are given by a vertexset V = { , . . . n } and an edge set E = { j → i : i , j ∈ V , i , j } . A vertex j is a parent of i if j → i ∈ E and wedenote the set of parents of i by pa( i ). A path from j to i is a sequence of distinct nodes [ j = ℓ , ℓ , . . . , ℓ m = i ] suchthat either ℓ r → ℓ r + or ℓ r + → ℓ r for each r . A directed path from j to i is a path where ℓ r → ℓ r + for all r and a directed cycle is a directed path where j = i . A vertex j is an ancestor of i if there exists a directed path from j to i and we denote the set of ancestors of i by an( i ). Most directed graphs in this note will be directed acyclic graphs (DAG) which are directed graphs with no directed cycles.We will also frequently use partially ordered sets, typically called posets, throughout this note. A poset , P , is aset equipped with a binary relation ≤ that is reflexive, antisymmetric, and transitive. A chain is a poset where anytwo elements are comparable and we denote a chain whose elements are [ k ] = { , . . . , k } with the natural relationby k . An element j of P is said to cover i , if j > i and there is no other element k such that j > k > i . A posetcan be represented by an undirected graph called the Hasse diagram of the poset where the vertices correspond toelements of P , edges represent cover relations, and the elements are ordered in the figure so smaller elements areat the bottom. See Figure 1 for an example.An order ideal of P is a subset I of P such that if i ∈ I and j ≤ i then j ∈ I . A dual order ideal of P is a subset I of P such that if i ∈ P and j ≥ i then j ∈ P . An order-preserving map is a map φ : P → Q between posets suchthat if j , i ∈ P satisfy j ≤ i then φ ( j ) ≤ φ ( i ).Many of the posets we work with will be obtained by taking the transitive closure of a DAG. Let D be a DAGand define a relation on the vertices by j ≤ i if and only if there is a directed path from j to i . We refer to this posetas D tr , which denotes the transitive closure poset.A lattice, L , is a poset equipped with two binary operations, denoted ∨ and ∧ , such that for any elements s , t ∈ L the least upper bound of s and t exists which is s ∨ t and the greatest lower bound of s and t exists which is s ∧ t .The operation ∨ is typically referred to as the join while ∧ is called the meet . A classic example that will arise inthis note is the lattice of order ideals of a finite poset. Let P be a finite poset and denote by J ( P ) the order ideals of P ordered by inclusion then J ( P ) is a lattice with meet given by intersection and join given by union. Furthermore, J ( P ) is a distributive lattice meaning the operations ∨ and ∧ distribute over each other. Throughout this note wewill also use ∨ to denote the maximum of a subset of a totally ordered set and ∧ to denote the minimum since theseoperations are the join and meet respectively for finite subsets of totally ordered sets.2.2. Conjunctive Bayesian Networks.
In this section we provide some basic background on the CBN modelwhich was first introduced by Beerenwinkel, Eriksson, and Sturmfels in [2] as a mathematical model for mutationsoccurring in a genome. For further information we refer the reader to [2, 3]. In Section 3, we show that CBNmodels can be viewed as a subclass of D-MLBN models which are our main focus in this note.The CBN model begins with a poset P whose elements are called events which are usually taken to be theelements of [ n ]. The state space of the model is the lattice of order ideals J ( P ) and elements g ∈ J ( P ) are called genotypes . It is often convienent to think of a state g both as a subset of the ground set [ n ] and as a 0-1 string andwe will do so throughout the note.The parameterization of the CBN model can be explicitly written down in terms of the poset P but it is oftenconvenient to instead view it as a directed graphical model. We refer the reader to [12, Chapter 13] for additionalinformation on parameterizations of graphical models. We first form a DAG with edges i → j if i < j is a coverrelation in P and associate a binary random variable X i to each i ∈ P . Then the CBN model is a directed graphicalmodel on the X i with conditional probabilities given by(1) ( P ( X i = b | X pa( i ) = a )) a ∈{ , } pa( i ) , b ∈{ , } = ... ... θ ( i )0 θ ( i )1 where the rows and columns are ordered lexicographically and θ ( i )1 is simply the conditional probability that theevent i occurs given all of its parents have occurred while θ ( i )0 is the conditional probability that the event i does notoccur given all of its parents have occurred. This means that θ ( i )0 + θ ( i )1 =
1. We note that this slightly di ff erent thanthe matrix of conditional probabilities shown in [3] but it is equivalent. The probability of observing an event g is ISCRETE MAX-LINEAR BAYESIAN NETWORKS 31 234 5 P (0 , , , , , , , ,
0) (0 , , , , , , , , , , , , , , , ,
0) (1 , , , , , , , , J ( P ) Figure 1.
The hasse diagram of a poset P and its lattice of order ideals J ( P ) written as 0-1 strings instead of subsets of { , , , , } . the product of these conditional probabilities so p g = P ( X = g ) = Y i ∈ P P ( X i = g i | X pa( i ) = g pa( i ) ) . Example 2.1.
Let P be the poset pictured on the left of Figure 1. The state space of the CBN model on P is thelattice of order ideals J ( P ) pictured on the right of Figure 1. In this representation, the 0-1 strings represent thecorresponding subsets of the ground set of P . For instance, the element (1 , , , ,
1) of J ( P ) represents the orderideal { , , , } of P . The parameterization of the CBN model on P is p = θ (1)0 θ (2)0 p = θ (1)1 θ (2)0 p = θ (1)0 θ (2)1 p = θ (1)1 θ (2)1 θ (3)0 p = θ (1)1 θ (2)1 θ (3)1 θ (4)0 θ (5)0 p = θ (1)1 θ (2)1 θ (3)1 θ (4)1 θ (5)0 p = θ (1)1 θ (2)1 θ (3)1 θ (4)0 θ (5)1 p = θ (1)1 θ (2)1 θ (3)1 θ (4)1 θ (5)1 . Note that we suppress the commas and parentheses when writing p g which we will do throughout this note.3. D iscrete M ax -L inear B ayesian N etworks In this section we introduce a discrete version of the max-linear models studied in [1, 6, 10]. When all of therandom variables in the model are binary, there is a direct correspondence between discrete max-linear models andthe conjunctive Bayesian networks discussed in [2, 3]. For this reason, we try to keep our notation consistent withthat of [3] when possible.Let D = ( V , E ) be a directed acyclic graph (DAG) and associated a discrete random variable Z i with k states toeach vertex i ∈ V . Then the discrete max-linear Bayesian network (D-MLBN) is the family of joint distributionsof the random variables ( X i ) i ∈ V specified by(2) X i = _ j ∈ pa( i ) X j ∨ Z i , i ∈ V where pa( i ) denotes the parents of vertex i in D . These are the same structural equations used to specify the max-linear Bayesian networks discussed in [1, 6, 10] except there are no coe ffi cients and the random variables Z , . . . Z n are now discrete instead of continuous and atom-free. Despite these alterations, this system of equations still hasthe same solution which is(3) X i = _ j ∈ an( i ) ∪{ i } Z j , i ∈ V BENJAMIN HOLLERING AND SETH SULLIVANT1 23 45 D (0 , , , , , , , , , , , ,
1) (0 , , , , , , , ,
1) (0 , , , , , , , ,
1) (1 , , , , , , , , G (0 , , , , , , , ,
0) (0 , , , , , , , ,
0) (1 , , , , , , , ,
0) (1 , , , , , , , , , , , , J ( D tr ) Figure 2.
A DAG D and the state space, G , of the 2-state D-MLBN model pictured as the lattice of order preservingmaps from D to the chain . Also pictured is the lattice of order ideals J ( D tr ) of the poset D tr written as 0-1 vectors. where an( i ) denotes the ancestors of vertex i in D .The state space of the k -state discrete max-linear model is the set of order-preserving maps from the transitiveclosure of D to a chain of length k . More explicitly, let k be a chain of size k and let D tr be the poset obtained bytaking the transitive closure of D . Then g = ( g , . . . g n ) is a possible state of the k -state D-MLBN if there existsan order-preserving map π : D tr → k such that π ( i ) = g i . This can be seen by directly examining the structuralequations and their solution. Note that if i ≥ j in the partial order D tr , then j is an ancestor of i so there is a directedpath [ j = ℓ , ℓ , . . . , ℓ m = i ] and for any r it is immediate that, ℓ r ∈ pa( ℓ r + ) which immediately implies that X l r + ≥ X l r . This gives a chain of inequalities from which we get X j ≤ X i . Denote this set of states by G ( D , k ) andnote that this forms a distributive lattice with meet given by taking the coordinate-wise minimum and join given bytaking the coordinate-wise maximum. When it is clear from context, we simply write G instead of G ( D , k ). Thisis analogous to the state space of the CBN model being the lattice of order ideals of a poset [3]. Example 3.1.
Let D be the DAG pictured in Figure 2. Then the structural equations of the D-MLBN model on D are X = Z , X = Z , X = X ∨ Z X = X ∨ X ∨ Z , X = X ∨ X ∨ Z which have the solution X = Z , X = Z , X = Z ∨ Z X = Z ∨ Z ∨ Z , X = Z ∨ Z ∨ Z ∨ Z ∨ Z . If each of the Z i has two states, so k =
2, then the state space of the model is the lattice G that is also pictured inFigure 2.Similarly to the CBN model, the D-MLBN model can also be thought of as a directed graphical model on aDAG D with additional restrictions on the parameters. In the usual directed graphical model, the parameters of themodel are the conditional probabilities P ( X i = x i | X pa( i ) = x pa( i ) ) [5]. In the D-MLBN model, we can compute theseconditional probabilities in terms of the distributions of the Z i . Let each Z i have distribution θ ( i ) = ( θ ( i )0 , . . . , θ ( i ) k − ) ∈ ∆ k − and g ∈ { , . . . k − } n . For any i define M i = W j ∈ pa( i ) g j then we have(4) P ( X i = g i | X pa( i ) = g pa( i ) ) = , g i < M i P ℓ ≤ M i θ ( i ) ℓ , g i = M i θ ( i ) g i , g i > M i . ISCRETE MAX-LINEAR BAYESIAN NETWORKS 5
Using these conditional probabilities, we can then compute the probability p g = P ( X = g ) which is p g = Y i ∈ V P ( X i = g i | X pa( i ) = g pa( i ) ) . Note that if g does not come from an order preserving map, that is g < G ( D , k ), then p g =
0. This means we caneither think of the D-MLBN model as just being a model for the states G ( D , k ) or a model for all g ∈ { , . . . k − } n where the g < G ( D , k ) have probability 0. Both of these perspectives can be useful when studying the algebraicstructure of the model. The following example illustrates this parametric description of the model. Example 3.2.
We again let D and G be the DAG and lattice pictured in Figure 2. Then from the above discussionthe parameterization of the D-MLBN model is p = θ (1)0 θ (2)0 θ (3)0 θ (4)0 θ (5)0 p = θ (1)0 θ (2)0 θ (3)0 θ (4)0 θ (5)1 p = θ (1)0 θ (2)0 θ (3)0 θ (4)1 p = θ (1)0 θ (2)0 θ (3)1 θ (4)0 p = θ (1)0 θ (2)0 θ (3)1 θ (4)1 p = θ (1)0 θ (2)1 θ (3)0 p = θ (1)0 θ (2)1 θ (3)1 p = θ (1)1 θ (2)0 p = θ (1)1 θ (2)1 . Note that while this appears to be a monomial map, the relationship θ ( i )0 + θ ( i )1 = Theorem 3.3.
Let D = ( V , E ) be a DAG and ρ be the map parameterizing the CBN model on D tr . Let ψ be the mapparameterizing the 2-state D-MLBN model on D . Then image( ρ ) is equal to image( ψ ) after a natural relabelingof coordinates. In particular, there exists a bijection, φ : J ( D tr ) → G ( D , such that ρ g ( θ (1)0 , θ (1)1 , . . . , θ ( n )0 , θ ( n )1 ) = ψ φ ( g ) ( θ (1)1 , θ (1)0 , . . . , θ ( n )1 , θ ( n )0 ) .Proof. We first describe the bijection between the state spaces of the two models. Recall that the state space ofthe CBN model is the distributive lattice J ( D tr ) of order ideals. There is a natural bijection between elements g ∈ J ( D tr ) and order-preserving maps from π : D tr → { , } . The general form of the following bijection can befound in [11, Proposition 3.5.1] but we describe the special case for order-preserving maps to { , } here. For anyorder ideal g of D tr let π g be the map defined by π g ( i ) = , i ∈ g , i < g , It remains to show that ρ g ( θ (1)0 , θ (1)1 , . . . , θ ( n )0 , θ ( n )1 ) = ψ φ ( g ) ( θ (1)1 , θ (1)0 , . . . , θ ( n )1 , θ ( n )0 ). We do this by showing that the twomodels have the same conditional probabilities after interchanging θ ( i )0 and θ ( i )1 .Let φ ( g ) ∈ G ( D ,
2) be a state of the D-MLBN model and for each i let M i = W j ∈ pa( i ) φ ( g ) j . Observe that M i = _ j ∈ pa( i ) φ ( g ) j = , φ ( g ) j = j ∈ pa( i )1 , otherwise = , g j = j ∈ pa( i )1 , otherwisewith the second equality following from the definition of φ . We now examine the di ff erent possibilities for φ ( g ) i and M i and compute the conditional probabilities in each case.Suppose φ ( g ) i = M i =
0, then under the D-MLBN model, we have P ( X i = φ ( g ) i | X pa( i ) = φ ( g ) pa( i ) ) = X ℓ ≤ θ ( i ) ℓ = θ ( i )0 . We know from the above formula for M i that if M i =
0, then for all j ∈ pa( i ), g j =
1. Since φ ( g ) i = g i = θ ( i )1 .If φ ( g ) i = M i =
0, then under the D-MLBN model, we have P ( X i = φ ( g ) i | X pa( i ) = φ ( g ) pa( i ) ) = θ ( i )1 . BENJAMIN HOLLERING AND SETH SULLIVANT
On the other hand, we now have g i = M i = θ ( i )0 .It is straightforward to check the remaining cases so we omit it here. In these cases both models have the sameconditional probabilities which are either 0 or 1. Since each model is a directed graphical model, the probabilityof observing g (or φ ( g )) is simply ρ g ( θ ) = Y i ∈ V P CBN ( X i = g i | X pa( i ) = g pa( i ) ) ψ g ( θ ) = Y i ∈ V P D − MLBN ( X i = g i | X pa( i ) = g pa( i ) ) . Since we know that the conditional probabilities are equal after interchanging the parameters θ ( i )0 and θ ( i )1 for each i ,we have that the above products are equal after interchanging the corresponding parameters as claimed above. (cid:3) We end this section with an example that illustrates the previous theorem.
Example 3.4.
Let D be the DAG pictured on the left of Figure 2. We have already seen that G = G ( D ,
2) picturedin the middle of Figure 2 is the state space of the D-MLBN model on D while the lattice of order ideals J ( D tr ) thatis pictured on the right of Figure 2 is the state space of the CBN model on D tr .The map φ from Theorem 3.3 maps the element g = (1 , , , , ∈ J ( D tr ) to the element φ ( g ) = (0 , , , , ∈G . The probability of g under the CBN model is ρ ( θ (1)0 , θ (1)1 , . . . , θ (5)0 , θ (5)1 ) = θ (1)1 θ (2)0 θ (3)1 while the probability of φ ( g ) under the D-MLBN model is ψ ( θ (1)0 , θ (1)1 , . . . , θ (5)0 , θ (5)1 ) = θ (1)0 θ (2)1 θ (3)0 . We can see that these two probabilities will be equal after interchanging the parameters θ ( i )0 and θ ( i )1 .4. A lgebraic S tructure of the D-MLBN M odel
In this section we describe the algebraic structure of the D-MLBN model. We do this by extending the tech-niques developed for CBN models in [3] to all D-MLBN models. The main tool here is M¨obius inversion whichcorresponds to a linear change of coordinates on the D-MLBN model. In these new coordinates, the ideal of poly-nomials that vanish on a D-MLBN belongs to a special class of toric ideals whose Gr¨obner bases were describedby Hibi in [9].Let D be a DAG and ψ ( k ) D be the map parameterizing the k -state D-MLBN model on D and G be the state spaceof the model. Also let R [ p g ] = R [ p g : g ∈ G ], then our goal is to find a Gr¨obner basis for the ideal I ( k ) D = { f ∈ R [ p g ] : f ( a ) = a ∈ image( ψ ( k ) D ) } . Finding a Gr¨obner basis for the ideal I D is a first step in obtaining an implicit description of the D-MLBN modelon D . Example 4.1.
Again let D be the DAG pictured on the left in Figure 2. Then the ideal I (2) D is generated by thepolynomials p + p + p + p + p + p + p + p + p − , p p − p p − p p , p p − p p − p p − p p , p p − p p , p p − p p − p p , p p − p p − p p . As we noted before, the state space of the D-MLBN model is a distributive lattice. Hibi showed in [9] that thereis a toric ideal naturally associated to such a lattice. In [3] the authors show that the ideal of the CBN model is thetoric ideal defined by Hibi, after a suitable change of coordinates. So it is natural to attempt to extend this resultfrom the binary case to D-MLBN models with an arbitrary number of states. We describe the construction of Hibi
ISCRETE MAX-LINEAR BAYESIAN NETWORKS 7 here but for additional information we refer the reader to [8] or [9]. Let L be a distributive lattice and recall thatthere is unique poset P (up to isomorphism) such that L = J ( P ) (see [11, Thm 3.4.1]). Let P have ground set [ n ].Then the map ϕ L : R [ q g : g ∈ L ] → R [ t , x , . . . x n ] q g t Y i ∈ g x i has kernel I L = ker( φ L ) generated by I L = h q g q h − q g ∧ h q g ∨ h : g , h ∈ G are incomparable i [9]. Recall that elements g and h in a poset are incomparable if neither g ≤ h nor h ≤ g . The following example illustrates this construction. Example 4.2.
Let P be the poset pictured on the left in Figure 1 whose lattice of order ideals L = J ( P ) is picturedon the right. Then ϕ L is given by q = t q = tx q = tx q = tx x q = tx x x q = tx x x x q = tx x x x q = tx x x x x . There are two pairs of incomparable elements in the lattice L which are (1 , , , ,
0) and (0 , , , ,
0) as well as(1 , , ,
0) and (1 , , , , I L are the binomials q q − q q , q q − q q which correspond to these two pairs of incomparable elements.We are now ready to state our main result. Theorem 4.3.
Let D = ( V , E ) be a DAG and I ( k ) D be vanishing ideal of the k-state D-MLBN model on D . Thenafter homogeneizing the map ψ ( k ) D and applying the linear change of coordinatesq g = X h ≤ g p h on R [ p g ] the ideal I ( k ) D is equal to the toric ideal associated to the distributive lattice J ( D tr × k − ) with generatingset I ( k ) D = h q g q h − q g ∧ h q g ∨ h : g , h ∈ G are incomparable i . Proof.
First we note that for any g ∈ G = G ( D , k ) we have that(5) X h ≤ g p h = Y i ∈ V X ℓ ≤ g i θ ( i ) ℓ . This leads us to consider a transform of the parameter space given by(6) α ( i ) j = X ℓ ≤ j θ ( i ) ℓ . Note that the matrix of this transformation is block diagonal and each block can be made into a lower triangularmatrix with ones on the diagonal so this is truly a linear change of coordinates. Combining Equations 5 and 6 wesee that in the transformed coordinates, the map ψ ( k ) D is given by q g = t Y i ∈ V α ( i ) g i . We also have introduced a new variable t which homogenizes the parameterization so that I ( k ) D will be a homoge-neous ideal. This simply removes the trivial relation that all of the coordinates sum to one.We now consider the parameterization of the Hibi ideal associated to I L where L = J ( D tr × { , , . . . k − } ).By [11, Proposition 3.5.1], there is a bijection between order-preserving maps g ∈ G and order ideals of the poset BENJAMIN HOLLERING AND SETH SULLIVANT12 3 D (1 , , ,
0) (3 , ,
1) (3 , D tr × { , } Figure 3.
A DAG D and the poset D tr × { , } whose order ideals correspond to the order-preserving maps from D to { , , } D tr × { , , . . . , k − } . Under this bijection, a map g is sent to an order ideal ˜ g = { ( i , r ) ∈ D tr × { , , . . . , k − } : 0 ≤ r ≤ k − g i − } . Then I L is the kernel of the map ϕ L : R [ q g : g ∈ L ] → R [ t , x ( i ) r : i ∈ [ n ] , r ∈ , . . . , k − q g t Y ( i , r ) ∈ ˜ g x ( i ) r . At first, this parameterization might look quite di ff erent when compared to the parameterization ψ ( k ) D but we cantransform the parameter space of ψ ( k ) D again so that they agree. Consider the transform given by α ( i ) j = k − j − Y r = x ( i ) r and note that this is invertible with inverse given by x ( i ) i − j = α ( i ) j − α ( i ) j . After applying this transform on the parameter space of ψ ( k ) D the map is is given by q g = t Y i ∈ v k − g i − Y r = x ( i ) r = t Y ( i , r ) ∈ ˜ g x ( i ) r where the last equality follows directly from the definition of ˜ g . Since the ideals I ( k ) D and I L are now the kernel ofthe exact same map, they are equal and the generating set stated above is exactly the generating set described byHibi in [9]. (cid:3) Remark.
Note that the coordinate transform that takes the p g coordinates to the q g coordinates corresponds toM¨obius inversion on the lattice G . This is the same transform that is used in [3] for CBN models but in this casethe resulting toric ideal is that associated to J ( D tr ) since it is a 2-state model.We conclude with two examples which illustrate the previous theorem. Example 4.4.
Let D be the graph pictured in Figure 3 on the left. Consider the state g = (0 , , ∈ G of the 3-stateD-MLBN model on D . The original probability of this state under the model is p = θ (1)0 θ (2)1 θ (3)2 . Then after ourfirst coordinate transform q = X h ≤ g p h = p + p + p + p + p + p = θ (1)0 ( θ (2)0 + θ (2)1 )( θ (3)0 + θ (3)1 + θ (3)2 ) = α (1)0 α (2)1 . Note that we omit α (3)2 since the parameter corresponding to the state k − α (1)0 = x (1)0 x (1)1 and α (2)1 = x (2)0 . After homogenizing with anew parameter t and applying this second transform we have that ψ (3) D ( q ) = tx (1)0 x (1)1 x (2)0 . ISCRETE MAX-LINEAR BAYESIAN NETWORKS 9
The state g also corresponds to the order ideal ˜ g = { (1 , , (1 , , (2 , } ∈ L = J ( D tr × { , } ). This means themap ϕ L takes q g to ϕ L ( q ) = tx (1)0 x (1)1 x (2)0 and we can see that ϕ L ( q g ) = ψ (3) D ( q g ) as was shown in the proof of Theorem 4.3. Example 4.5.
Let D be the graph pictured in Figure 2. Then after homogenizing the parameterization ψ (2) D andapplying the coordinate transform described in Theorem 4.3 the ideal I (2) D is generated by the polynomials q q − q q , q q − q q , q q − q q , q q − q q , q q − q q . Note that the first monomial in each polynomial corresponds to a pair of incomparable elements g , h ∈ G while thesecond corresponds to their meet and join which are given by taking coordinate-wise minimums and maximumsrespectively. A cknowledgments Benjamin Hollering and Seth Sullivant were partially supported by the US National Science Foundation (DMS1615660). R eferences [1] Carlos Am´endola, Claudia Kl¨uppelberg, Ste ff en Lauritzen, and Ngoc Tran. Conditional Independence in Max-linear Bayesian Networks. arXiv:2002.09233 , 2020.[2] Niko Beerenwinkel, Nicholas Eriksson, and Bernd Sturmfels. Evolution on distributive lattices. J. Theoret. Biol. , 242(2):409–420, 2006.[3] Niko Beerenwinkel, Nicholas Eriksson, and Bernd Sturmfels. Conjunctive Bayesian networks.
Bernoulli , 13(4):893–909, 2007.[4] Niko Beerenwinkel and Seth Sullivant. Markov models for accumulating mutations.
Biometrika , 96(3):645–661, 2009.[5] Luis David Garcia, Michael Stillman, and Bernd Sturmfels. Algebraic geometry of Bayesian networks.
J. Symbolic Comput. , 39(3-4):331–355, 2005.[6] Nadine Gissibl and Claudia Kl¨uppelberg. Max-linear models on directed acyclic graphs.
Bernoulli , 24(4A):2693–2720, 2018.[7] Nadine Gissibl, Claudia Kl¨uppelberg, and Ste ff en Lauritzen. Identifiability and estimation of recursive max-linear models. ScandinavianJournal of Statistics , April 2020.[8] J¨urgen Herzog, Takayuki Hibi, and Hidefumi Ohsugi.
Binomial ideals , volume 279 of
Graduate Texts in Mathematics . Springer, Cham,2018.[9] Takayuki Hibi. Distributive lattices, a ffi ne semigroup rings and algebras with straightening laws. In Commutative algebra and combina-torics (Kyoto, 1985) , volume 11 of
Adv. Stud. Pure Math. , pages 93–109. North-Holland, Amsterdam, 1987.[10] Claudia Kl¨uppelberg and Ste ff en Lauritzen. Bayesian networks for max-linear models. In Network science , pages 79–97. Springer, Cham,2019.[11] Richard P. Stanley.
Enumerative Combinatorics. Volume 1 , volume 49 of
Cambridge Studies in Advanced Mathematics . CambridgeUniversity Press, Cambridge, second edition, 2012.[12] Seth Sullivant.
Algebraic Statistics , volume 194 of