Complexity of modification problems for best match graphs
CComplexity of Modification Problems for Best Match Graphs
David Schaller , Peter F. Stadler , and Marc Hellmuth Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany Bioinformatics Group, Department of Computer Science & Interdisciplinary Center for Bioinformatics, Universit¨at Leipzig,H¨artelstraße 16–18, D-04107 Leipzig, Germany. Institute for Theoretical Chemistry, University of Vienna, W¨ahringerstrasse 17, A-1090 Wien, Austria Facultad de Ciencias, Universidad National de Colombia, Sede Bogot´a, Colombia Santa Fe Insitute, 1399 Hyde Park Rd., Santa Fe NM 87501, USA School of Computing, University of Leeds, EC Stoner Building, Leeds LS2 9JT, UK [email protected]
Abstract
Best match graphs (BMGs) are vertex-colored directed graphs that were introduced to modelthe relationships of genes (vertices) from different species (colors) given an underlying evolutionarytree that is assumed to be unknown. In real-life applications, BMGs are estimated from sequencesimilarity data. Measurement noise and approximation errors therefore result in empirically deter-mined graphs that in general violate properties of BMGs. The arc modification problems for BMGstherefore provide a mean of correcting and improving the initial estimates of the best matches. Weshow here that the arc deletion, arc completion and arc editing problems for BMGs are NP-completeand that they can be formulated and solved as integer linear programs.
Keywords:
Best matches; Graph modification; NP-hardness; Integer linear program
Best match graphs (BMGs) appear in mathematical biology as formal description of the evolu-tionary relationships within a gene family. Each vertex x represents a gene and is “colored” bythe species σ ( x ) in which it resides. A directed arc connects a gene x with its closest relatives ineach of the other species [10]. Empirically, best matches are routinely estimated by measuring andcomparing the similarity of gene sequences. Measurement errors and systematic biases, however,introduce discrepancies between “most similar genes” extracted from data and the notion of bestmatches in the sense of closest evolutionary relatedness [10, 11]. While some systematic effectscan be corrected directly [23], a residual level of error is unavoidable. It is therefore a questionof considerable practical interest in computational biology whether the mathematical propertiescharacterizing BMGs can be used to correct empirical estimates. Formally, this question amountsto a graph editing problem: Given a vertex-labeled directed graph ( G, σ ), what is the minimalnumber of arcs that need to be inserted or deleted to convert (
G, σ ) into a BMG ( G ∗ , σ )?Best matches are, in particular, closely linked to the identification of orthologous genes, i.e., pairsof genes whose last common ancestor coincides with the divergence of two species [9]. Orthologousgenes from different species are expected to have essentially the same biological functions. Thusconsiderable efforts have been expended to devise methods for orthology assessment, see e.g. [22, 3,21] for reviews and applications. The orthology graph of a gene family (with the genes as verticesand undirected edges between orthologous genes) can be shown to be a subgraph of the reciprocalbest match graph (RBMG), i.e., the symmetric part of the BMG [11]. This has sparked interestin a characterization of RBMGs [12] and the corresponding graph editing problems [15]. Thedeletion and the editing problems of 2-colored RBMGs are equivalent to Bicluster Deletion and
Bicluster Editing , respectively, a fact that was used to demonstrate NP-hardness for thegeneral, (cid:96) -colored case. On the other hand, orthology graphs are cographs [16].
Cograph Editing or Cograph Deletion thus have been used to correct empirical approximations of RBMGs toorthology graphs in [17]. Both
Cograph Editing and
Cograph Deletion are NP-complete a r X i v : . [ c s . CC ] J un igure 1: A (sub)graph induced by a bi-clique consisting of 3 black and 3 white vertices. [19]. In [20], we showed that knowledge of BMG makes it possible to identify the edges of theRBMG that cannot be part of the orthology graph and found that these edges in general do notform an optimal solution of either
Cograph Editing or Cograph Deletion . This observationsuggests to correct the empirical similarity data at the outset by editing them to the nearest BMGsinstead of operating on an empirical approximation of the RBMG. Given a BMG, the orthologygraph can then be computed in polynomial time [20].We therefore analyze the arc modification problems for (cid:96) -BMGs, that is, BMGs on (cid:96) colors.This contribution is organized as follows: After introducing some notation and reviewing someimportant properties of BMGs, Sec. 3 provides a characterization of 2-BMGs in terms of forbiddensubgraphs. We then prove in Sec. 4 that 2 -BMG Deletion and 2 -BMG Editing are NP-completeby reduction from
Exact 3-Cover , and that 2 -BMG Completion is NP-complete by reductionfrom
Chain Graph Completion . These results are used in Sec. 5 to establish NP-completenessfor any fixed number (cid:96) ≥ (cid:96) -BMGmodification problems in Sec. 6. In this contribution, we consider simple directed graphs (digraphs) G = ( V, E ) with vertex set V and arc set E ⊆ V × V \ { ( v, v ) | v ∈ V } . For a vertex x ∈ V , we say that ( y, x ) is an in-arc and ( x, z ) is an out-arc . The (weakly) connected components of G are the maximal connectedsubgraphs of the undirected graph underlying G . We call x a hub-vertex of a graph G = ( V, E ) if( x, v ) ∈ E and ( v, x ) ∈ E holds for all vertices v ∈ V \ { x } . The subgraph induced by a subset W ⊆ V is denoted by G [ W ]. We write N ( x ) := { z ∈ V | ( x, z ) ∈ E } for the out-neighborhood and N − ( x ) := { z ∈ V | ( z, x ) ∈ E } for the in-neighborhood of x ∈ V . A graph is sink-free if it has novertex with out-degree zero, i.e., if N ( x ) (cid:54) = ∅ for all x ∈ V .We write E (cid:52) F := ( E \ F ) ∪ ( F \ E ) for the symmetric difference of the sets E and F . Moreover,for a graph G = ( V, E ) and an arc set F , we define the graphs G + F := ( V, E ∪ F ), G − F := ( V, E \ F )and G (cid:52) F := ( V, E (cid:52) F ). A vertex coloring of G is a surjective map σ : V → S , where S is a setof | S | ≥ proper if σ ( x ) (cid:54) = σ ( y ) for all ( x, y ) ∈ E . We often write | S | -coloring to emphasize the number of colors in G . Moreover, we write σ ( W ) := { σ ( v ) | v ∈ W } .The restriction of σ to a subset W ⊆ V of vertices is denoted by σ | W . The colored subgraph of G induced by W is therefore ( G [ W ] , σ | W ). Observation 2.1.
Let x be a hub-vertex in a properly-colored graph ( G, σ ) . Then x is the onlyvertex of color σ ( x ) in ( G, σ ) . Definition 2.2. A bi-clique of a colored digraph ( G, σ ) is a subset of vertices C ⊆ V ( G ) such that(i) | σ ( C ) | = 2 and (ii) ( x, y ) ∈ E ( G [ C ]) if and only if σ ( x ) (cid:54) = σ ( y ) for all x, y ∈ C . A coloreddigraph ( G, σ ) is a bi-cluster graph if all its connected components are bi-cliques. In a bi-clique, all arcs between vertices of different color are present. Thus a bi-clique with n and m vertices in the two color classes has 2 nm arcs, see Fig. 1 for the case n = m = 3. Weemphasize that, in contrast to the definition used in [15], single vertex graphs are not consideredas bi-clique.A phylogenetic tree T (on L ) is an (undirected) rooted tree with root ρ T , leaf set L ⊆ V ( T )and inner vertices V ( T ) = V ( T ) \ L such that each inner vertex of T (except possibly the root) isof degree at least three. Throughout this contribution, we assume that every tree is phylogenetic. he ancestor order on V ( T ) is defined such that u (cid:22) T v if v lies on the unique path from u tothe root ρ T , i.e., if v is an ancestor of v . We write u ≺ T v if u (cid:22) T v and u (cid:54) = v . If xy is an edge in T , such that y ≺ T x , then x is the parent of y and y the child of x . We denote by child T ( x ) theset of all children of x . For a non-empty subset A ⊆ V ∪ E we define lca T ( A ), the last commonancestor of A , to be the unique (cid:22) T -minimal vertex of T that is an ancestor of every u ∈ A . Forsimplicity we write lca T ( A ) = lca T ( x , . . . , x k ) whenever we specify a vertex set A = { x , . . . , x k } explicitly. Note that lca T ( x, y ) and lca T ( x, z ) are comparable for all x, y, z ∈ L w.r.t. (cid:22) T .A (rooted) triple is a tree on three leaves and with two inner vertices. We write xy | z for thetriple on the leaves x, y and z if the path from x to y does not intersect the path from z to theroot, i.e., if lca T ( x, y ) ≺ T lca T ( x, z ) = lca T ( y, z ). In this case we say that T displays xy | z . A set R of triples on L , i.e., a set of triples R such that (cid:83) T ∈ R L ( T ) = L , is compatible if there is a treewith leaf set L that displays every triple in L . If R is compatible, then such a tree, the Aho tree
Aho( R ) can be constructed in polynomial time [2]. For a set L , a set of triples R is strictly dense if for all three distinct x, y, z exactly one of the triples xy | z, xz | y and yz | x is contained in R . Forlater reference, we provide Lemma 2.3. [17, Lemma 7] If R is a compatible set of triples on L , then there is a strictly densecompatible triple set R (cid:48) on L that contains R . A tree T with leaf set L together with function σ : L → S is a leaf-colored tree, denoted by ( T, σ ). Definition 2.4.
Let ( T, σ ) be a leaf-colored tree. A leaf y ∈ L ( T ) is a best match of the leaf x ∈ L ( T ) if σ ( x ) (cid:54) = σ ( y ) and lca( x, y ) (cid:22) T lca( x, y (cid:48) ) holds for all leaves y (cid:48) of color σ ( y (cid:48) ) = σ ( y ) . The graph G ( T, σ ) = (
V, E ) with vertex set V = L ( T ), vertex coloring σ , and with arcs( x, y ) ∈ E if and only if y is a best match of x w.r.t. ( T, σ ) is known as the (colored) best matchgraph (BMG) of (
T, σ ) [10]. We call an (cid:96) -colored BMG simply (cid:96) -BMG. Since the last commonancestors of any two vertices of T always exists, and lca T ( x, y ) and lca T ( x, z ) are comparable,there is by definition at least one best match of x for every color s ∈ S \ { σ ( x ) } : Observation 2.5.
For every vertex x and every color s (cid:54) = σ ( x ) in a BMG ( G, σ ) there is somevertex y ∈ N ( x ) with σ ( y ) = s . In particular, therefore, BMGs are sink-free whenever they contain at least two colors. We notein passing that sink-free graphs also appear naturally e.g. in the context of graph semigroups [1]and graph orientation problems [6].
Definition 2.6.
An arbitrary vertex-colored graph ( G, σ ) is a best match graph (BMG) if thereexists a leaf-colored tree ( T, σ ) such that ( G, σ ) = G ( T, σ ) . In this case, we say that ( T, σ ) explains(
G, σ ) . Whether two vertices x and y are best matches or not does not depend on the presence orabsence of vertices z with σ ( z ) / ∈ { σ ( x ) , σ ( y ) } . More precisely, we have Observation 2.7. [10, Obs. 1] Let ( G, σ ) be a BMG explained by T with leaf set L and let L (cid:48) := (cid:83) s ∈ S L [ s ] be a subset of vertices with a restricted color set S ⊆ σ ( L ) . Then the in-duced subgraph ( G [ L (cid:48) ] , σ | L (cid:48) ) is explained by the restriction T | L (cid:48) of T to the leaf set L (cid:48) , i.e. ( G [ L (cid:48) ] , σ | L (cid:48) ) = G ( T | L (cid:48) , σ | L (cid:48) ) . It was shown in [10] that BMGs can be characterized in terms of certain induced subgraphs onthree vertices. In [20], the corresponding conditions were simplified further:
Definition 2.8.
Let ( G, σ ) be a properly vertex colored graph. We say that a triple xy | y (cid:48) is infor-mative for ( G, σ ) if x , y and y (cid:48) are pairwise distinct vertices in G such that (i) σ ( x ) (cid:54) = σ ( y ) = σ ( y (cid:48) ) and (ii) ( x, y ) ∈ E ( G ) and ( x, y (cid:48) ) / ∈ E ( G ) . The set of informative triples is denoted by R ( G, σ ) . As shown in [10], a properly 2-colored graph (
G, σ ) is a BMG if and only if (
G, σ ) = G (Aho( R ( G, σ )) , σ ). More generally, BMGs can be characterized in terms of informative triples asfollows: Theorem 2.9. [10, Cor. 5] A connected colored digraph ( G, σ ) is a BMG if and only if (i) allinduced subgraphs ( G st , σ st ) on two colors are BMGs and (ii) the set R ( G, σ ) is compatible. In the following section, we will need the following, more technical results: emma 2.10. [20, Lemma 2.8 and 2.9] Let ( G, σ ) be a BMG and xy | y (cid:48) an informative triplefor G . Then, every tree ( T, σ ) that explains ( G, σ ) displays the triple xy | y (cid:48) , i.e. lca T ( x, y ) ≺ T lca T ( x, y (cid:48) ) = lca T ( y, y (cid:48) ) .Moreover, if the triples ab | b (cid:48) and cb (cid:48) | b are informative for ( G, σ ) , then, every tree ( T, σ ) thatexplains ( G, σ ) contains two distinct children v , v ∈ child T (lca T ( a, c )) such that a, b ≺ T v and b (cid:48) , c ≺ T v . Lemma 2.11. [10, Prop. 1] The disjoint union of vertex disjoint BMGs ( G i , σ i ) , ≤ i ≤ k is aBMG if and only if if all color sets are the same, i.e., σ i ( V ( G i )) = σ j ( V ( G j )) for ≤ i < j ≤ k . A graph is thin if no two vertices have the same neighborhood.
Definition 2.12.
Two vertices x, y ∈ L are in relation ∼ • if N ( x ) = N ( y ) and N − ( x ) = N − ( y ) . Clearly the thinness relation ∼ • is an equivalence relation on V . For each ∼ • class α we have N ( α ) = N ( x ) and N − ( α ) = N − ( x ) for all x ∈ α . Theorem 2.13. [10, Thm. 3 and 4] Let ( G, σ ) be a connected properly 2-colored digraph. Then, ( G, σ ) is a BMG if and only if for any two ∼ • classes α and β of G holds (N0) N ( α ) (cid:54) = ∅ (N1) α ∩ N ( β ) = β ∩ N ( α ) = ∅ implies N ( α ) ∩ N ( N ( β )) = N ( β ) ∩ N ( N ( α )) = ∅ . (N2) N ( N ( N ( α ))) ⊆ N ( α )(N3) α ∩ N ( N ( β )) = β ∩ N ( N ( α )) = ∅ and N ( α ) ∩ N ( β ) (cid:54) = ∅ implies N − ( α ) = N − ( β ) and N ( α ) ⊆ N ( β ) or N ( β ) ⊆ N ( α ) . We note that [10] tacitly assumed (N0), i.e., that (
G, σ ) is sink-free.
In this section, we derive a new characterization of 2-colored BMGs in terms of forbidden inducedsubgraphs. Our starting point is the observation that certain constellations of arcs on four or fivevertices cannot occur.
Definition 3.1 (F1-, F2-, and F3-graphs) . (F1) A properly 2-colored graph on four distinct vertices V = { x , x , y , y } with coloring σ ( x ) = σ ( x ) (cid:54) = σ ( y ) = σ ( y ) is an F1-graph if ( x , y ) , ( y , x ) , ( y , x ) ∈ E and ( x , y ) , ( y , x ) / ∈ E . (F2) A properly 2-colored graph on four distinct vertices V = { x , x , y , y } with coloring σ ( x ) = σ ( x ) (cid:54) = σ ( y ) = σ ( y ) is an F2-graph if ( x , y ) , ( y , x ) , ( x , y ) ∈ E and ( x , y ) / ∈ E . (F3) A properly 2-colored graph on five distinct vertices V = { x , x , y , y , y } with coloring σ ( x ) = σ ( x ) (cid:54) = σ ( y ) = σ ( y ) = σ ( y ) is an F3-graph if ( x , y ) , ( x , y ) , ( x , y ) , ( x , y ) ∈ E and ( x , y ) , ( x , y ) / ∈ E . The “templates” for F1-, F2-, and F3-graphs are shown in Fig. 2. They define 8, 16, and 64graphs by specifying the presence or absence of the 3, 4, and 6 optional (dashed) arcs, respectively,see Figs. 8 and 9 in the Appendix. The F1- and F2-graphs fall into a total of 16 isomorphismclasses, four of which are both F1- and F2-graphs. All but one of the F3-graphs contain an F1-or an F2-graph as induced subgraph. The exception is the “template” of the F3-graphs withoutoptional arcs. The 17 non-redundant forbidden subgraphs are collected in Fig. 3. We shall seebelow that they are sufficient to characterize 2-BMGs among the sink-free graphs.
Lemma 3.2. If ( G, σ ) is a BMG, then it contains no induced F1-, F2-, or F3-graph.Proof. Let (
T, σ ) be a tree that explains (
G, σ ).First, assume that (
G, σ ) contains an induced F1-graph, i.e., there are four vertices x , x , y , y satisfying (F1), and let u := lca T ( x , y ). Then, ( x , y ) , ( y , x ) ∈ E , ( x , y ) , ( y , x ) / ∈ E andLemma 2.10 imply that T must display the informative triples x y | y and y x | x . Hence, u must have two distinct children v and v such that x , y ≺ T v and x , y ≺ T v . Therefore, x y y x x y y y x y y x F1-graphs F2-graphs F3-graphs
Figure 2:
Templates of the three families of forbidden induced subgraphs in BMGs. Black arcs mustexist, dashed gray arcs may or may not be present.
F1 F2 F3F1 ∩ F2 G G G G G G G G G G G G G G G G G Figure 3:
Forbidden induced subgraphs in BMGs. All F3-graphs with at least one optional arc havean induced F1- or F2-graph and thus are redundant. lca T ( x , y ) (cid:22) T v ≺ T u = lca T ( x , y ) and σ ( x ) = σ ( x ) imply that ( y , x ) / ∈ E ( G ); a contra-diction.Next, assume that ( G, σ ) contains an induced F2-graph, i.e., there are four vertices x , x , y , y satisfying (F2). Then ( x , y ) ∈ E , ( x , y ) / ∈ E and Lemma 2.10 imply that T displaysthe informative triple x y | y and thus lca T ( x , y ) ≺ T lca T ( x , y ). Since ( y , x ) ∈ E and σ ( x ) = σ ( x ), we conclude that lca T ( x , y ) (cid:22) T lca T ( x , y ) ≺ T lca T ( x , y ) = and thereforealso lca T ( x , y ) ≺ T lca T ( x , y ) = lca T ( x , y ). Together with σ ( y ) = σ ( y ), the latter contra-dicts ( x , y ) ∈ E .Finally, assume that ( G, σ ) contains an induced F3-graph, i.e., there are five vertices x , x , y , y , y satisfying (F3). By Lemma 2.10, ( x , y ) ∈ E and ( x , y ) / ∈ E implies that T displays the triple x y | y , and ( x , y ) ∈ E together with ( x , y ) / ∈ E implies that T displays thetriple x y | y . Furthermore, lca T ( x , x ) has distinct children v and v such that x , y ≺ T v and x , y ≺ T v . Now since σ ( y ) = σ ( y ) = σ ( y ), the two arcs ( x , y ) and ( x , y ) imply thatlca T ( x , y ) (cid:22) T lca T ( x , y ) (cid:22) T v and lca T ( x , y ) (cid:22) T lca T ( x , y ) (cid:22) T v , respectively. Since v and v are incomparable w.r.t. (cid:22) T , this is a contradiction. Lemma 3.3.
Let ( G, σ ) be a properly 2-colored graph. Then ( G, σ ) satisfies (N1) if it does notcontain an induced F1-graph, it satisfies (N2) if it does not contain an induced F2-graph, and itsatisfies (N3) if is contains neither an induced F1-graph nor an induced F3-graph. roof. We employ contraposition and thus show that ( G = ( V, E ) , σ ) contains a forbidden subgraphwhenever (N1), (N2) or (N3) are violated.Assume that (N1) is not satisfied. Thus, there are two ∼ • -classes α and β with α ∩ N ( β ) = β ∩ N ( α ) = ∅ for which N ( α ) ∩ N ( N ( β )) (cid:54) = ∅ or N ( β ) ∩ N ( N ( α )) (cid:54) = ∅ . We can w.l.o.g. assumethat N ( β ) ∩ N ( N ( α )) (cid:54) = ∅ . Note that α ∩ N ( β ) = ∅ implies that ( y, x ) / ∈ E for all x ∈ α, y ∈ β .Likewise ( x, y ) / ∈ E for all x ∈ α, y ∈ β , since β ∩ N ( α ) = ∅ . Let x ∈ α , y ∈ β and x ∈ N ( β ) ∩ N ( N ( α )) (cid:54) = ∅ . It must hold ( x , y ) , ( y , x ) / ∈ E by the arguments above. Since x ∈ N ( β ),we have ( y , x ) ∈ E . Moreover, σ ( x ) = σ ( x ) (cid:54) = σ ( y ), since ( G, σ ) is properly colored. Clearly, x ∈ N ( N ( α )) implies that N ( α ) (cid:54) = ∅ . Now, let y ∈ N ( α ) be a vertex such that ( y , x ) ∈ E ,which must exist as a consequence of x ∈ N ( N ( α )). We have ( x , y ) since y ∈ N ( α ) andthus σ ( y ) = σ ( y ) (cid:54) = σ ( x ) = σ ( x ). Finally, ( y , x ) ∈ E immediately implies that y (cid:54) = y .In summary, ( x , y ) , ( y , x ) , ( y , x ) ∈ E and ( x , y ) , ( y , x ) / ∈ E , and thus ( G, σ ) contains aninduced F1-graph.Now assume that (N2) is not satisfied and thus, N ( N ( N ( α ))) (cid:54)⊆ N ( α ) for some ∼ • -class α .Note, the latter implies that N ( N ( N ( α ))) (cid:54) = ∅ . Hence, there is a vertex y ∈ N ( N ( N ( α ))) suchthat y / ∈ N ( α ). Thus, there is a vertex x ∈ α such that ( x , y ) / ∈ E . By the definition ofneighborhoods and since y ∈ N ( N ( N ( α ))), we find vertices y ∈ N ( α ) and x ∈ N ( N ( α )) suchthat ( x , y ) , ( y , x ) , ( x , y ). Since ( G, σ ) is properly colored, we must have σ ( x ) = σ ( x ) (cid:54) = σ ( y ) = σ ( y ). Moreover, ( x , y ) / ∈ E together with ( x , y ) ∈ E and ( x , y ) ∈ E implies x (cid:54) = x and y (cid:54) = y , respectively. We conclude that the subgraph induced by x , x , y , y contains aninduced F2-graph.Finally, assume that (N3) is not satisfied. Hence, there are two ∼ • -classes α and β with α ∩ N ( N ( β )) = β ∩ N ( N ( α )) = ∅ and N ( α ) ∩ N ( β ) (cid:54) = ∅ , but (i) N − ( α ) (cid:54) = N − ( β ), or (ii) neither N ( α ) ⊆ N ( β ) nor N ( β ) ⊆ N ( α ). Note, N ( α ) ∩ N ( β ) (cid:54) = ∅ implies that there a vertices x ∈ α and x ∈ β with σ ( x ) = σ ( x ) since ( G, σ ) is properly 2-colored. In particular, there must be a vertex y with ( x , y ) , ( x , y ) ∈ E and thus σ ( x ) = σ ( x ) (cid:54) = σ ( y ).Now consider Case (i) and suppose that N − ( α ) (cid:54) = N − ( β ). Thus we can assume w.l.o.g.that there is a y ∗ with ( y ∗ , x ) ∈ E but ( y ∗ , x ) / ∈ E . Note, ( x , y ∗ ) / ∈ E , since otherwise( x , y ∗ ) , ( y ∗ , x ) ∈ E would contradict β ∩ N ( N ( α )) = ∅ . Thus, y ∗ (cid:54) = y since ( x , y ∗ ) / ∈ E and ( x , y ) ∈ E . Furthermore, σ ( y ∗ ) = σ ( y ) (cid:54) = σ ( x ) = σ ( x ), since ( G, σ ) is properly 2-colored.In summary, ( y ∗ , x ) , ( x , y ) , ( x , y ) ∈ E and ( y ∗ , x ) , ( x , y ∗ ) / ∈ E which implies that ( G, σ )contains an induced F1-graph.Now consider Case (ii) and assume that it holds neither N ( α ) ⊆ N ( β ) nor N ( β ) ⊆ N ( α ).Clearly, the latter implies N ( α ) (cid:54) = ∅ and N ( β ) (cid:54) = ∅ . The latter two arguments imply that there mustbe two distinct vertices y ∈ N ( α ) \ N ( β ) and y ∈ N ( β ) \ N ( α ) and, therefore, ( x , y ) , ( x , y ) ∈ E and ( x , y ) , ( x , y ) / ∈ E . It follows that y (cid:54) = y and y (cid:54) = y and σ ( y ) = σ ( y ) = σ ( y ) (cid:54) = σ ( x ) = σ ( x ). This and ( x , y ) , ( x , y ) , ( x , y ) , ( x , y ) ∈ E together with ( x , y ) , ( x , y ) / ∈ E impliesthat ( G, σ ) contains an induced F3-graph.Based on the latter findings we obtain here a new characterization of 2-colored BMGs that isnot restricted to connected graphs.
Theorem 3.4.
A properly 2-colored graph is a BMG if and only if it is sink-free and does notcontain an induced F1-, F2-, or F3-graph.Proof.
Suppose that (
G, σ ) is 2-colored BMG and C be the set of its connected components. ByLemma 3.2, ( G, σ ) does not contain an induced F1-, F2- or F3-graph. Moreover, by Lemma 2.11,( G [ C ] , σ | C ) must be a 2-colored BMG for all C ∈ C . Hence, we can apply Thm. 2.13 to concludethat each ( G [ C ] , σ | C ) satisfies (N0)-(N3). Since every x ∈ V is contained in some ∼ • -class, (N0) isequivalent to N ( x ) (cid:54) = ∅ , i.e., ( G, σ ) is sink-free.Now suppose that (
G, σ ) is properly 2-colored and sink-free, and that it does not contain aninduced F1-, F2- and F3-graph. By Lemma 3.3, (
G, σ ) satisfies (N1)-(N3). Thus, in particular, eachconnected component of (
G, σ ) is sink-free and satisfies and (N1)-(N3). Note, N ( x ) (cid:54) = ∅ impliesthat the connected components of ( G, σ ) contain at least one arc and, by assumption, they areproperly 2-colored. Moreover, this implies that (N0) is satisfied for every connected component of(
G, σ ). Hence, Thm. 2.13 implies that every connected component of (
G, σ ) is a 2-colored BMG.By Lemma 2.11, (
G, σ ) is also a 2-colored BMG. .. ... X v ρ ... ... ..... Y ..... XY Y n v n ... ... Y n G(T, σ )(T, σ ) Figure 4:
A tree (
T, σ ) whose BMG G ( T, σ ) contains bi-cliques X and Y , . . . , Y n . The thick grayarrows indicate that all arcs in that direction exist between the respective sets. In real-live applications, we have to expect that graphs estimated from empirical best match datawill contain errors. Therefore, we consider the problem of correcting erroneous and/or missing arcs.Formally, we consider the following graph modification problems for properly colored digraphs.
Problem 4.1 ( (cid:96) -BMG Deletion ) . Input:
A properly (cid:96) -colored digraph ( G = ( V, E ) , σ ) and an integer k . Question:
Is there a subset F ⊆ E such that | F | ≤ k and ( G − F, σ ) is an (cid:96) -BMG? It is worth noting that (cid:96) -BMG Deletion does not always have a feasible solution. In particular,if (
G, σ ) contains a sink, no solution exits for any (cid:96) . In contrast, it is always possible to obtain aBMG from a properly colored digraph (
G, σ ) if arc insertions are allowed. To see this, observe thatthe graph ( G (cid:48) , σ ) with V ( G (cid:48) ) = V ( G ) that contains all arcs between vertices of different colors is aBMG, since it is explained the tree with leaf set V ( G (cid:48) ) in which all leaves are directly attached tothe root. This suggest that the following two problems are more relevant for practical applications: Problem 4.2 ( (cid:96) -BMG Editing ) . Input:
A properly (cid:96) -colored digraph ( G = ( V, E ) , σ ) and an integer k . Question:
Is there a subset F ⊆ V × V \ { ( v, v ) | v ∈ V } such that | F | ≤ k and ( G (cid:52) F, σ ) is an (cid:96) -BMG? Problem 4.3 ( (cid:96) -BMG Completion ) . Input:
A properly (cid:96) -colored digraph ( G = ( V, E ) , σ ) and an integer k . Question:
Is there a subset F ⊆ V × V \ ( { ( v, v ) | v ∈ V } ∪ E ) such that | F | ≤ k and ( G + F, σ ) is an (cid:96) -BMG? In this section, we consider decision problems related to modifying 2-colored digraphs. Thegeneral case with an arbitrarily large number (cid:96) ≥ (cid:96) = 2, we will show that both 2 -BMG Deletion and 2 -BMG Editing are NP-complete by reduction from the Exact 3-Cover problem (
X3C ), one of Karp’s famous 21 NP-complete problems [18].
Problem 4.4 ( Exact 3-Cover (X3C) ) . Input:
A set S with |S| = 3 t elements and a collection C of 3-element subsets of S . Question:
Does C contain an exact cover for S , i.e., a subcollection C (cid:48) ⊆ C such thatevery element of S occurs in exactly one member of C (cid:48) ? An exact 3-cover C (cid:48) of S with |S| = 3 t is necessarily of size |C (cid:48) | = t and satisfies (cid:83) C ∈C (cid:48) C = S . Theorem 4.1. [18]
X3C is NP-complete.
We start with a simple construction of a subclass of BMGs from disconnected 2-colored bi-clustergraph:
Lemma 4.2.
Let ( G, σ ) be a 2-colored bi-cluster graph with a least two connected components, let C be the set of connected components of ( G, σ ) , and let ( G (cid:48) , σ ) be the graph obtained from ( G, σ ) byadding all arcs ( x, y ) with x ∈ X ∈ C and y ∈ (cid:83) Y ∈ C \{ X } Y for which σ ( x ) (cid:54) = σ ( y ) . Then ( G (cid:48) , σ ) isa BMG.Proof. To see that ( G (cid:48) , σ ) is a BMG it suffices to show that there is a tree ( T, σ ) that explains( G (cid:48) , σ ). To this end, consider the tree ( T, σ ) as shown in Fig. 4 and its BMG G ( T, σ ). Observe first hat, for all x, y ∈ X , it holds lca T ( x, y ) = ρ = lca T ( x, y (cid:48) ) = lca( x (cid:48) , y ) for all x (cid:48) , y (cid:48) ∈ L ( T ). Hence, X is a bi-clique and there are arcs from all vertices in X to all vertices of distinct color in Y i ∈ C \{ X } .Moreover, for all x, y ∈ Y i ∈ C \ { X } it holds that lca T ( x, y ) = v i (cid:22) T lca T ( x, y (cid:48) ) = lca( x (cid:48) , y ) forall x (cid:48) , y (cid:48) ∈ L ( T ). Hence, Y i is a bi-clique for all Y i ∈ C \ { X } . Finally, for all x, y ∈ Y i ∈ C \ { X } and all x (cid:48) , y (cid:48) ∈ L ( T ) \ Y i it holds lca T ( x, y ) = v i ≺ T lca T ( x, y (cid:48) ) = lca( x (cid:48) , y ) = ρ which implies thatthere are no arcs from vertices in Y i to vertices in X and no arcs between distinct Y i , Y j ∈ C \ { X } .In summary, G ( T, σ ) = ( G (cid:48) , σ ) and hence, ( G (cid:48) , σ ) is a BMG.We are now in the position to prove NP-completeness of 2 -BMG Editing . The strategy of theNP-hardness proof is very similar to the one used in [7] and [19]. Theorem 4.3. -BMG Editing is NP-complete.Proof. Since 2-BMGs can be recognized in polynomial time [10, cf. Lemma 18], the 2 -BMG Edit-ing problem is clearly contained in NP. To show the NP-hardness, we use reduction from
X3C .Let S with n = 3 t and C = { C , . . . , C m } be an instance of X3C . Clearly, if m = t the X3C problem becomes trivial and thus, we assume w.l.o.g. that m > t . The latter implies that everysolution C (cid:48) of X3C satisfies C (cid:48) (cid:40) C . Moreover, we assume w.l.o.g. that C i (cid:54) = C j , 1 ≤ i < j ≤ m .We construct an instance ( G = ( V, E ) , σ, k ), where ( G, σ ) is colored with the two colors black andwhite, of the 2 -BMG Editing problem as follows: First, we construct a bi-clique S consisting ofa black vertex s b and a white vertex s w for every s ∈ S . Thus the subgraph induced by S has 6 t vertices and r := 18 t arcs in total. Let q := 3 × [6 r ( m − t ) + r − t ]. For each of the m subsets C i in C , we introduce two bi-cliques X i and Y i , where X i consists of r black and r white new vertices,and Y i consists of q black and q white new vertices. In addition to the arcs provided by bi-cliquesconstructed in this manner, we add the following additional arcs: – ( x, y ) for every x ∈ X i and y ∈ Y i with σ ( x ) (cid:54) = σ ( y ) (note ( y, x ) / ∈ E ), – ( x, s b ) for every white vertex x ∈ X i and every element s ∈ C i , and, – ( x, s w ) for every black vertex x ∈ X i and every element s ∈ C i .This construction is illustrated in Fig. 5. Clearly, ( G, σ ) is properly colored, and the reduction canbe computed in polynomial time. ..................... X X X X m Y Y Y Y m s s s s s s n Figure 5:
Illustration of the reduction from
Exact 3-Cover . The thick gray arrows indicate that allarcs from that set to another set/vertex exist. The illustration emphasizes the analogy to [7] and [19].
We set k := 6 r ( m − t ) + r − t and show that there is a t -element subset C (cid:48) of C that serves as asolution of X3C if and only 2 -BMG Editing with input (
G, σ, k ) has a yes-answer. We emphasizethat the coloring σ remains unchanged in the proof below.First suppose that X3C with input S and C has a yes-answer. Thus, there is a t -element subset C (cid:48) of C such that (cid:83) C ∈C (cid:48) C = S . We construct a set F and add, for all C i ∈ C \C (cid:48) , the arcs ( x, s b ) and( x, s w ) for all s ∈ C i and for every white, resp., black vertex x ∈ X i , respectively. Since | C i | = 3for every C i ∈ C and |C \ C (cid:48) | = m − t , the set F contains exactly 6 r ( m − t ) arcs, so far. Now,we add to F all arcs ( s b , s (cid:48) w ) and ( s w , s (cid:48) b ) whenever the corresponding elements s and s (cid:48) belong todistinct elements in C (cid:48) , i.e., there is no C ∈ C (cid:48) with { s, s (cid:48) } ⊂ C . Therefore, the subgraph of G − F nduced by S is the disjoint union of t bi-cliques, each consisting of exactly 3 black vertices, 3 whitevertices, and 18 arcs. Hence, F contains, in addition to the 6 r ( m − t ) arcs, further r − t arcs.Thus | F | = k . This completes the construction of F .Since F contains only arcs but no non-arcs of G , we have G (cid:52) F = G − F . It remains to showthat G (cid:52) F is a BMG. To this end observe that G (cid:52) F has precisely m connected components thatare either induced by X i ∪ Y i (in case C i ∈ C \ C (cid:48) ) or X i ∪ Y i ∪ S (cid:48) where S (cid:48) is a bi-clique containingthe six vertices corresponding to the elements in C i ∈ C (cid:48) . In particular, each of these componentscorresponds to the subgraph as specified in Lemma 4.2 and thus, they are BMGs. In particular,all of these subgraphs contain at least one black and one white vertex. Hence, Lemma 2.11 impliesthat ( G (cid:52) F, σ ) is a BMG.Now, suppose that with input (
G, σ ) has a yes-answer. Thus, there is a set F with | F | ≤ k such that ( G (cid:52) F, σ ) is a BMG. We will proof that we have to delete an arc setsimilar to the one as constructed above. First note that the number of vertices affected by F , i.e.vertices incident to inserted/deleted arcs, is at most 2 k . Since 2 k < q = |{ y ∈ Y i | σ ( y ) = black }| = |{ y ∈ Y i | σ ( y ) = white }| for every 1 ≤ i ≤ m , we have at least on black vertex b i ∈ Y i and at leastone white vertex w i ∈ Y i that are unaffected by F . We continue by proving Claim 4.3.1.
Every vertex s ∈ S has in-arcs from at most one X i in G (cid:52) F .Proof: Assume w.l.o.g. that s is black and, for contradiction, that there are two distinct vertices x ∈ X i and x ∈ X j with i (cid:54) = j and ( x , s ) , ( x , s ) ∈ E (cid:52) F . Clearly, both x and x are white.As argued above, there are two (distinct) black vertices b ∈ Y i and b ∈ Y j that are not affectedby F . Thus, ( x , b ) and ( x , b ) remain arcs in G (cid:52) F , whereas ( x , b ) and ( x , b ) are not arcs in G (cid:52) F , since they do not form arcs in G . In summary, we have five distinct vertices x , x , b , b , s with σ ( x ) = σ ( x ) (cid:54) = σ ( b ) = σ ( b ) = σ ( s ), arcs ( x , b ) , ( x , b ) , ( x , s ) , ( x , s ) and non-arcs( x , b ) , ( x , b ). Thus ( G (cid:52) F, σ ) contains an induced F3-graph. By Lemma 3.2, ( G (cid:52) F, σ ) is nota BMG; a contradiction. (cid:5)
By Claim 4.3.1, every vertex in S has in-arcs from at most one X i . Note each X i has r black and r white vertices. Since each element in S is either white or black, each single element in S has atmost r in-arcs. Since | S | = 2 n we obtain at most 2 rn = 2 r (3 t ) = 6 rt such arcs G (cid:52) F . In G , thereare in total 6 rm arcs from the vertices in all X i to the vertices in S . By Claim 4.3.1, F containsat least 6 r ( m − t ) deletions. It remains to specify the other at most r − t arc modifications. Tothis end, we show first Claim 4.3.2.
Every vertex s ∈ S has in-arcs from precisely one X i in G (cid:52) F .Proof: Assume that there is a vertex s ∈ S that has no in-arc from any X i . Hence, to theaforementioned 6 r ( m − t ) deletions we must add r further deletions. However, at most r − t further edits are allowed; a contradiction. (cid:5) So far, F contains only arc-deletions. For the the next arguments, we need the following twostatements: Claim 4.3.3.
The modification set F does not insert any arcs between X i and X j with i (cid:54) = j .Proof: Assume for contradiction that G (cid:52) F contains an arc ( x , x ) with x ∈ X i , x ∈ X j and i (cid:54) = j . W.l.o.g. assume that x is white and x is black. As argued above there are black, resp.,white vertices b, w ∈ Y j that are unaffected by F . Therefore, ( x , w ) and ( b, w ) remain arcs in G (cid:52) F , whereas ( x , b ) and ( b, x ) are not arcs in G (cid:52) F since they do not form arcs in G . Insummary, ( x , x ) , ( b, w ) , ( x , w ) are arcs in G (cid:52) F while ( x , b ) , ( b, x ) are not arcs in G (cid:52) F .Since moreover σ ( x ) = σ ( w ) (cid:54) = σ ( b ) = σ ( x ), ( G (cid:52) F, σ ) contains an induced F1-graph. ByLemma 3.2, ( G (cid:52) F, σ ) is not a BMG; a contradiction. (cid:5)
Claim 4.3.4.
Let s , s ∈ S be vertices with in-arcs ( x , s ) , resp., ( x , s ) in G (cid:52) F for some x ∈ X i and x ∈ X j with i (cid:54) = j . Then ( s , s ) and ( s , s ) cannot be arcs in G (cid:52) F .Proof: Assume w.l.o.g. that ( s , s ) is an arc in G (cid:52) F and that s is black. It follows that x and s are white and x is black. By construction of G and by Claim 4.3.3, we clearlyhave ( x , x ) , ( x , x ) / ∈ E (cid:52) F . In summary, we have four distinct vertices x , x , s , s with σ ( x ) = σ ( s ) (cid:54) = σ ( s ) = σ ( x ), arcs ( x , s ) , ( x , s ) , ( s , s ) and non-arcs ( x , x ) , ( x , x ) in G (cid:52) F . Thus ( G (cid:52) F, σ ) contains an induced F1-graph. By Lemma 3.2, ( G (cid:52) F, σ ) is not a BMG;a contradiction. (cid:5) n summary, G (cid:52) F has the following property: Every s ∈ S has in-arcs from exactly one X i , andthere are no arcs between s , s ∈ S that have in-arcs from two different sets X i and X j . Since | C i | = 3 for every C i ∈ C , ( G (cid:52) F )[ S ] contains connected components of size at most 6, i.e., theblack and white vertex for each of the three elements in C i . Hence, the maximum number of arcsin ( G (cid:52) F )[ S ] is obtained when each of its connected components contains exactly these 6 verticesand they form a bi-clique. In this case, ( G (cid:52) F )[ S ] contains 18 t arcs. We conclude that F containsat least another r − t deletion arcs for S . Together with the at least 6 r ( m − t ) deletions betweenthe X i and the elements of S , we have at least 6 r ( m − t ) + r − t = k ≥ | F | arc-deletions in F .Since | F | ≤ k by assumption, we obtain | F | = k .As argued above, the subgraph induced by S is a disjoint union of t bi-cliques of 3 white and 3black vertices each. Since all vertices of such a bi-clique have in-arcs from the same X i and thesein-arcs are also in G , we readily obtain the desired partition C (cid:48) ⊂ C of S . In other words, the C i corresponding to the X i having out-arcs to vertices in S in the edited graph G (cid:52) F induce an exactcover of S .The set F constructed in the proof of Thm. 4.3 contains only arc-deletions. This immediatelyimplies Corollary 4.4. -BMG Deletion is NP-complete. In order to tackle the complexity of the 2 -BMG Completion , we follow a different approachand employ a reduction from the
Chain Graph Completion problem. To this end, we need someadditional notation. An undirected graph U is bipartite if its vertex set can be partitioned into twonon-empty disjoint sets P and Q such that V ( U ) = P ∪· Q and every edge has one endpoint in P and the other endpoint in Q . We write U = ( P ∪· Q, (cid:101) E ) to emphasize that (cid:101) E is a set of undirectededges and that U is bipartite. Furthermore, we write N ( x ) also for the neighborhood of a vertex x in an undirected graph. Thus U is bipartite if and only if x ∈ P implies N ( x ) ⊆ Q and x ∈ Q implies N ( x ) ⊆ P . Definition 4.5. ([ ? , cf.]]Natanzon:2001,Yannakakis:1981) An undirected, bipartite graph U =( P ∪· Q, (cid:101) E ) is a chain graph if there is an order (cid:108) on P such that u (cid:108) v implies N ( u ) ⊆ N ( v ) . The
Chain Graph Completion problem consists in finding a minimum-sized set of additionaledges that converts an arbitrary undirected, bipartite graph into a chain graph. More formally, itsdecision version can be stated as follows:
Problem 4.5 ( Chain Graph Completion ( CGC )) . Input:
An undirected, bipartite graph U = ( P ∪· Q, (cid:101) E ) and an integer k . Question:
Is there a subset (cid:101) F ⊆ {{ p, q } | ( p, q ) ∈ P × Q } \ (cid:101) E such that | (cid:101) F | ≤ k and U (cid:48) := ( P ∪· Q, (cid:101) E ∪ (cid:101) F ) is a chain graph? It is shown in [24] that
CGC is NP-complete. Following [24], we say that two edges { u, v } and { x, y } in an undirected graph U are independent if u, v, x, y are pairwise distinct and the subgraph U [ { u, v, x, y } ] contains no additional edges. We will need the following characterization of chaingraphs: Lemma 4.6. [24, Lemma 1] An undirected, bipartite graph U = ( P ∪· Q, (cid:101) E ) is a chain graph if andonly if it does not contain a pair of independent edges. Theorem 4.7. -BMG Completion is NP-complete.Proof. Since BMGs can be recognized in polynomial time [10], 2 -BMG Completion is clearlycontained in NP. To show NP-hardness, we use a reduction from
CGC . Let ( U = ( P ∪· Q, (cid:101) E ) , k ) bean instance of CGC with vertex sets P = { p , . . . , p | P | } and Q = { q , . . . , q | Q | } . To construct aninstance ( G = ( V, E ) , σ, k ) of the 2 -BMG Completion problem, we set V = P ∪· Q ∪· R ∪· { b } ∪· { w } where R = { r , . . . , r | Q | } is a copy of Q . The vertices are colored σ ( p i ) = σ ( r j ) = σ ( b ) = blackand σ ( q i ) = σ ( w ) = white. The arc set E contains ( q i , r i ) and ( r i , q i ) for 1 ≤ i ≤ | Q | , ( p i , w ) for1 ≤ i ≤ | P | , ( w, b ) and ( b, w ), and ( p, q ) for every { p, q } ∈ (cid:101) E . This construction is illustrated inFig. 6. Clearly, ( G, σ ) is properly colored, and the reduction can be computed in polynomial time.Moreover, it is easy to verify that (
G, σ ) is sink-free by construction, and thus, any graph ( G (cid:48) , σ )obtained from ( G, σ ) by adding arcs is also sink-free. As above, we emphasize that the coloring σ remains unchanged in the completion process.A pair ( F, (cid:101) F ) with F ⊆ P × Q and an edge set (cid:101) F = {{ p, q } | ( p, q ) ∈ F } will be called a completion pair for the bipartite graph U = ( P ∪· Q, (cid:101) E ) and the corresponding 2-colored digraph( G = ( V, E ) , σ ). Q P Qb w(G = (V, E), σ ) RU = (P ∪ Q, E). ~ Figure 6:
Illustration of the reduction from
CGC . A pair of independent edges in U and the corre-sponding induced F3-graph in ( G, σ ) are highlighted.
Claim 4.7.1. If ( F, (cid:101) F ) is a completion pair, then | F | = | (cid:101) F | , ( p, q ) ∈ F if and only if { p, q } ∈ (cid:101) F ,and ( p, q ) ∈ F ∪ E if and only if { p, q } ∈ (cid:101) F ∪ (cid:101) E .Proof: First note that, by construction, F contains only arcs from vertices in P to vertices in Q .This together with the definition (cid:101) F = {{ p, q } | ( p, q ) ∈ F } clearly implies ( p, q ) ∈ F if and only if { p, q } ∈ (cid:101) F and thus | F | = | (cid:101) F | . By construction of our reduction we have ( p, q ) ∈ E if and only if { p, q } ∈ (cid:101) E and thus also ( p, q ) ∈ E ∪ F if and only if { p, q } ∈ (cid:101) E ∪ (cid:101) F . (cid:5) Before we continue, observe that, for every pair of independent edges { p , q } , { p , q } ∈ (cid:101) E ,the graph ( G, σ ) contains an induced F3-graph ( G [ p , p , q , q , w ] , σ ). Together with Lemmas 3.2and 4.6 this implies that ( G, σ ) cannot be a BMG if U is not a chain graph. Eliminating theseinduced F3-graphs is closely connected to chain graph completion. More precisely we will show: Claim 4.7.2.
Let ( F, (cid:101) F ) be a completion pair. If ( G + F, σ ) is a BMG, then U (cid:48) = ( P ∪· Q, (cid:101) E ∪ (cid:101) F ) is a chain graph.Proof: Suppose that ( G + F, σ ) is a BMG and assume, for contradiction, that U (cid:48) = ( P ∪· Q, (cid:101) E ∪ (cid:101) F )is not a chain graph. The latter and Lemma 4.6 imply that U (cid:48) has two independent edges { p , q } , { p , q } ∈ (cid:101) E ∪ (cid:101) F . Thus { p , q } , { p , q } / ∈ (cid:101) E ∪ (cid:101) F . The latter arguments and Claim 4.7.1imply that ( p , q ) , ( p , q ) ∈ E ∪ F and ( p , q ) , ( p , q ) / ∈ E ∪ F . Since moreover ( p , w ) , ( p , w )and σ ( p ) = σ ( p ) (cid:54) = σ ( q ) = σ ( q ) = σ ( w ), it follows that the five distinct vertices p , p , q , q , w induce an F3-graph in ( G + F, σ ). By Lemma 3.2, ( G + F, σ ) cannot be a BMG; a contradiction. (cid:5)
The converse is also true:
Claim 4.7.3.
Let ( F, (cid:101) F ) be a completion pair for U = ( P ∪· Q, (cid:101) E ) , and suppose U (cid:48) = ( P ∪· Q, (cid:101) E ∪ (cid:101) F ) is a chain graph. Then ( G + F, σ ) is a BMG.Proof: By Thm. 3.4, ( G + F, σ ) is a 2-colored BMG if and and only if it is sink-free and does notcontain an induced F1-, F2-, or F3-graph. Since (
G, σ ) is sink-free, this is also true for ( G + F, σ ).Thus it suffices to show that ( G + F, σ ) does not contain an induced F1-, F2-, or F3-graph.Suppose that ( G + F, σ )[ u, u (cid:48) , v, v (cid:48) ] is an induced F1-graph. Let H be a subgraph of ( G + F, σ )[ u, u (cid:48) , v, v (cid:48) ] that is isomorphic to the essential F1-graph, that is, the F1-graph as specified inFig. 2 that contains only the solid-lined arcs and none of the dashed arcs while all other non-arcsremain non-arcs. In this case, there is an isomorphism ϕ from H to the essential F1-graph withvertex-labeling as in Fig. 2. Hence, ϕ ( u ) corresponds to one of the vertices x , x , y or y . Tosimplify the presentation we will say that, in this case, “ u plays the role of ϕ ( u ) in an F1-graph”.The latter definition naturally extends to F
2- and F F
2- and F σ ( u ) = σ ( ϕ ( u )).Nevertheless, for a, b ∈ { u, u (cid:48) , v, v (cid:48) } with σ ( a ) (cid:54) = σ ( b ) it always holds, by construction, that σ ( ϕ ( a )) (cid:54) = σ ( ϕ ( b )). n the following, an in- or out-neighbor of a vertex is just called neighbor . A flank vertex in anF1-, F2-, resp., F3-graph is a vertex that has only a single neighbor in the essential F1-, F2-,resp., F3-graph. To be more precise, when referring to Fig. 2, the flank vertices in an F1-graphand F2-graph are x and y , while the flank vertices in an F3-graph are y and y .Since ( F, (cid:101) F ) is a completion pair, by definition, F adds only arcs from P to Q . Hence, each of thevertices in R ∪ { b } has a single neighbor in ( G + F, σ ) irrespective of the choice of F . Therefore,if u ∈ R ∪ { b } is contained in an induced F1-, F2-, or F3-graph in ( G, σ ) or ( G + F, σ ), it mustbe a flank vertex. Observe first that b can only play the role of y in the F1- or F2-graph, sinceotherwise, the fact that w is the single neighbor of b in ( G, σ ) or ( G + F, σ ) implies that w mustplay the role of y in the F1- or F2-graph, which is not possible since b is the single out-neighborof w and F does not affect w . By similar arguments, none of the vertices in R ∪ { b } can play therole of x in an F1- or F2-graph, or the role of y or y in an F3-graph in ( G, σ ) or ( G + F, σ ).The vertex w has only in-arcs from the elements in P and from b . Likewise, the vertices q i ∈ Q have only in-arcs from P and from their corresponding vertex r i ∈ R . Therefore and since allelements in P have only out-neighbors, it is an easy task to verify that none of the vertices in R ∪ { b } can play the role of y in an F1- or F2-graph. Thus none of the vertices in R ∪ { b } is partof an induced F1-, F2-, or F3-graph.Thus it suffices to investigate the subgraph ( G (cid:48) , σ ) of ( G + F, σ ) induced by { w } ∪ P ∪ Q forthe presence of induced F1-, F2-, and F3-graphs. In G (cid:48) , none of the vertices in { w } ∪ Q haveout-neighbors since F ⊆ P × Q does not affect w and does not contain arcs from q i ∈ Q to anyother vertex. Thus, none of the vertices in { w } ∪ Q can play the role of x , y or y in an F1-,the role of x , y or x in an F2-graph, or the role of x or x in an F3-graph. Since { w } ∪ Q has only in-arcs from P , and P has no in-arcs in G (cid:48) , none of the vertices in { w } ∪ Q can play therole of x in an F1-graphs or the role of y in an F2-graph. Thus none of the vertices in { w } ∪ Q is part of an induced F1- or F2-graph. Hence, any induced F1- or F2-graph must be containedin G (cid:48) [ P ]. However, all vertices of P are colored black, and hence ( G (cid:48) [ P ] , σ | P ) cannot harbor aninduced F1- or F2-graph.Suppose ( G (cid:48) , σ ) contains an induced F3-graph. Then there are five pairwise distinct vertices x , x , y , y , y ∈ { w } ∪ P ∪ Q with coloring σ ( x ) = σ ( x ) (cid:54) = σ ( y ) = σ ( y ) = σ ( y ) satisfying( x , y ) , ( x , y ) , ( x , y ) , ( x , y ) ∈ E ∪ F and ( x , y ) , ( x , y ) / ∈ E ∪ F . Since P has no in-arcs in( G (cid:48) , σ ), it must hold that y , y , y / ∈ P . Since σ ( { w }∪ Q ) (cid:54) = σ ( P ) and ( G (cid:48) , σ ) is properly 2-colored,we have x , x ∈ P . Since w has in-arcs from all vertices in P and ( x , y ) , ( x , y ) / ∈ E ∪ F , vertex w can neither play the role of y nor of y in an F3-subgraph. Thus, y , y ∈ Q . Claim 4.7.1therefore implies { x , y } , { x , y } ∈ (cid:101) E ∪ (cid:101) F and { x , y } , { x , y } / ∈ (cid:101) E ∪ (cid:101) F . Hence, U (cid:48) contains apair of independent edges. By Lemma 4.6, it follows that U (cid:48) is not a chain graph; a contradiction. (cid:5) Together, Claims 4.7.2 and 4.7.3 imply that ( G + F, σ ) is a BMG if and only if U (cid:48) = ( P ∪· Q, (cid:101) E ∪ (cid:101) F )is a chain graph; see Fig. 7 for an illustrative example. Claim 4.7.4. If F is a minimum-sized arc completion set such that ( G + F, σ ) is a BMG, then F ⊆ P × Q .Proof: Let F be an arbitrary minimum-sized arc completion set, i.e., ( G + F, σ ) is a BMG, andput F (cid:48) := F ∩ ( P × Q ) and let ( F (cid:48) , (cid:102) F (cid:48) ) be the corresponding completion pair.If F (cid:48) = F , there is nothing to show. Otherwise, we have | F (cid:48) | < | F | and minimality of | F | implies that ( G + F (cid:48) , σ ) is not a BMG. By contraposition of Claim 4.7.3, we infer that U (cid:48) =( P ∪· Q, (cid:101) E ∪ (cid:102) F (cid:48) ) is not a chain graph. Hence, Lemma 4.6 implies that U (cid:48) contains a set ofindependent edges { p , q } , { p , q } ∈ (cid:101) E ∪ (cid:102) F (cid:48) and { p , q } , { p , q } / ∈ (cid:101) E ∪ (cid:102) F (cid:48) . By Claim 4.7.1,it follows that ( p , q ) , ( p , q ) ∈ E ∪ F (cid:48) and ( p , q ) , ( p , q ) / ∈ E ∪ F (cid:48) . Since F (cid:48) ⊂ F , we have( p , q ) , ( p , q ) ∈ E ∪ F . Furthermore, from ( p , q ) , ( p , q ) ∈ P × Q and F (cid:48) = F ∩ ( P × Q ),we conclude that ( p , q ) , ( p , q ) / ∈ E ∪ F . By construction of our reduction and since we onlyinsert arcs, we have ( p , w ) , ( p , w ) ∈ E ∪ F . Together with the coloring σ ( p ) = σ ( p ) (cid:54) = σ ( q ) = σ ( q ) = σ ( w ), the latter arguments imply that ( G + F, σ ) contains an induced F3-graph. ByLemma 3.2, this contradicts that ( G + F, σ ) is a BMG. (cid:5)
Now, let ( F, (cid:101) F ) be a completion pair such that | (cid:101) F | ≤ k and (cid:101) F is a minimum-sized edge com-pletion set for U . Thus U (cid:48) = ( P ∪· Q, (cid:101) E ∪ (cid:101) F ) is a chain graph. Hence, Claim 4.7.3 implies that( G + F, σ ) is a BMG. Since | F | = | (cid:101) F | ≤ k , it follows that 2 -BMG Completion with input ( G, σ, k )has a yes-answer if
CGC with input ( U = ( P ∪· Q, (cid:101) E ) , k ) has a yes-answer. (T, σ )P Qb w Rp p p p p q q q q q q r r r r r r b w p p p p p q q q q q q r r r r r r (G + F, σ )P Qp p p p p q q q q q q U = (P ∪ Q, E ∪ F ). ~ ~ Figure 7:
An example solution for
CGC , resp., 2 -BMG Completion as constructed in the proof ofThm. 4.7. A tree (
T, σ ) that explains the resulting BMG is shown on the right. Here, we have k = 4edge, resp., arc additions (indicated by dashed-gray lines) to obtain a chain-graph, resp., 2-BMG. Theindices of the vertices in P = { p , . . . , p | P | } are chosen w.r.t. the order (cid:108) on P i.e. i < j if and only if p i (cid:108) p j and thus, N ( p i ) ⊆ N ( p j ). In this example, we have N ( p ) ∩ Q = ∅ . Moreover, the vertex q has no neighbor in P . Finally, let F be a minimum-sized arc completion set for ( G, σ ), i.e. ( G + F, σ ) is a BMG, andassume | F | ≤ k . This and Claim 4.7.4 implies F ⊆ P × Q . For the corresponding completionpair ( F, (cid:101) F ) we have | (cid:101) F | = | F | ≤ k . Moreover, since ( G + F, σ ) is a BMG, Claim 4.7.2 impliesthat U = ( P ∪· Q, (cid:101) E ∪ (cid:101) F ) is a chain graph. Therefore, CGC with input ( U = ( P ∪· Q, (cid:101) E ) , k ) hasa yes-answer if 2 -BMG Completion with input ( G, σ, k ) has a yes-answer. This completes theproof. (cid:96) -BMG modification problems
We now turn to the graph modification problems for an arbitrary number (cid:96) of colors. The proof ofthe next theorem follows the same strategy of adding hub-vertices as in [15].
Theorem 5.1. (cid:96) -BMG Deletion , (cid:96) -BMG Completion , and (cid:96) -BMG Editing are NP-completefor all (cid:96) ≥ .Proof. BMGs can be recognized in polynomial time cf. [10, Sec. 5] and thus, all three problemsare contained in the class NP. Let ( G = ( V, E ) , σ ) be a properly colored digraph with (cid:96) colors.Thm. 4.3, Cor. 4.4 and Thm. 4.7 state NP-completeness for the case of (cid:96) = 2 colors. Thus assume (cid:96) ≥ (cid:96) -BMG Deletion , (cid:96) -BMG Completion , and (cid:96) -BMG Editing simply as (cid:96) -BMG Modification . Correspondingly,we write ( G (cid:12) F, σ ) and distinguish the three problems by the modification operation (cid:12) ∈ {− , + , (cid:52)} ,where (cid:12) = − , (cid:12) = + and (cid:12) = (cid:52) specifies that F is a deletion-, completion, or edit set, respectively.We use reduction from 2 -BMG Modification . To this end, let ( G = ( V , E ) , σ , k ) be aninstance of one of the latter three problems. To obtain a properly colored graph ( G (cid:96) = ( V (cid:96) , E (cid:96) ) , σ (cid:96) )with (cid:96) colors, we add to G a set V H of (cid:96) − G , σ ). Moreover, we add arcs such that every h ∈ V H becomes a hub-vertex. Note that V (cid:96) = V ∪· V H , G (cid:96) [ V ] = G , and ( σ (cid:96) ) | V = σ . Furthermore, V is a subset of V (cid:96) satisfying the condition in Obs. 2.7, i.e., V = (cid:83) s ∈ S V (cid:96) [ s ] for the color set S in( G , σ ). Clearly, the reduction can be performed in polynomial time. We proceed by showing thatan instance ( G , σ , k ) of the respective 2 -BMG Modification problem has a yes-answer if andonly if the corresponding instance ( G (cid:96) , σ (cid:96) , k ) of (cid:96) -BMG Modification has a yes-answer.Suppose that 2 -BMG Modification with input ( G , σ , k ) has a yes-answer. Then there is anarc set F ⊆ V × V \ { ( v, v ) | v ∈ V } with | F | ≤ k such that ( G (cid:12) F, σ ) is a BMG. Let ( T , σ ) bea tree explaining ( G (cid:12) F, σ ). From this tree, we construct a tree ( T (cid:96) , σ (cid:96) ) by adding the verticesin V H as leaves of the root ρ and coloring these leaves as in ( G (cid:96) , σ (cid:96) ). By construction, we have L ( T (cid:96) ) = V (cid:96) = V ∪ V H and T = ( T (cid:96) ) | V , where ( T (cid:96) ) | V is the restriction of T (cid:96) to the leaf set V . The atter arguments together with Obs. 2.7 imply that ( G ( T (cid:96) , σ (cid:96) )[ V ] , ( σ (cid:96) ) | V ) = G (( T (cid:96) ) | V , ( σ (cid:96) ) | V ) = G ( T , σ ) = ( G (cid:12) F, σ ).Let h ∈ V H be arbitrary. Since h is the only vertex of its color, ( x, h ) is an arc in G ( T (cid:96) , σ (cid:96) )for every x ∈ V (cid:96) \ { h } . Since h is a child of the root, we have moreover lca T (cid:96) ( x, h ) = ρ , and thus,( h, x ) is an arc in G ( T (cid:96) , σ (cid:96) ) for every x ∈ V (cid:96) \ { h } . The latter two arguments imply that h is ahub-vertex in G ( T (cid:96) , σ (cid:96) ). Since F is not incident to any vertex in V (cid:96) \ V = V H and each vertex h ∈ V H is a hub-vertex in ( G (cid:96) , σ (cid:96) ) and in G ( T (cid:96) , σ (cid:96) ), we conclude that G ( T (cid:96) , σ (cid:96) ) = ( G (cid:96) (cid:12) F, σ (cid:96) ).Hence, ( G (cid:96) (cid:12) F, σ (cid:96) ) is a BMG and the corresponding (cid:96) -BMG Modification problem with input( G (cid:96) , σ (cid:96) , k ) has a yes-answer.For the converse, suppose that (cid:96) -BMG Modification with input ( G (cid:96) , σ (cid:96) , k ) has a yes-answer.Thus, there is an arc set F ⊆ V (cid:96) × V (cid:96) \{ ( v, v ) | v ∈ V (cid:96) } with | F | ≤ k such that ( G (cid:96) (cid:12) F, σ (cid:96) ) is a BMG.Let ( T (cid:96) , σ (cid:96) ) be a tree explaining ( G (cid:96) (cid:12) F, σ (cid:96) ). Let F (cid:48) ⊆ F be the subset of arc modifications ( x, y )for which x, y ∈ V . Thus, it holds | F (cid:48) | ≤ | F | ≤ k . By construction, ( G (cid:96) (cid:12) F )[ V ] = G (cid:96) [ V ] (cid:12) F (cid:48) .Moreover, by Obs. 2.7, we have ( G ( T (cid:96) , σ (cid:96) )[ V ] , ( σ (cid:96) ) | V ) = G (( T (cid:96) ) | V , ( σ (cid:96) ) | V ). In summary, weobtain ( G (cid:12) F (cid:48) , σ ) = ( G (cid:96) [ V ] (cid:12) F (cid:48) , σ ) = (( G (cid:96) (cid:12) F )[ V ] , ( σ (cid:96) ) | V ) = ( G ( T (cid:96) , σ (cid:96) )[ V ] , ( σ (cid:96) ) | V ) = G (( T (cid:96) ) | V , ( σ (cid:96) ) | V ). Thus, ( G (cid:12) F (cid:48) , σ ) is a BMG. Together with | F (cid:48) | ≤ k , this implies that 2 -BMG Modification with input ( G , σ , k ) has a yes-answer. (cid:96) -BMG modification problems Hard graph editing problems can often be solved with integer linear programming (ILP) on prac-tically relevant instances. It is of interest, therefore, to consider an ILP formulation of the BMGdeletion, completion and editing problems considered above. As input, we are given an (cid:96) -coloreddigraph ( G = ( V, E ) , σ ). We encode its arcs by the binary constants E xy = 1 if and only if ( x, y ) ∈ E. for all pairs ( x, y ) ∈ V × V , x (cid:54) = y . The vertex coloring σ is represented by the the binary constant ς y,s = 1 if and only if σ ( y ) = s The arc set of the modified graph ( G ∗ , σ ) is encoded by binary variables (cid:15) xy , that is, (cid:15) xy = 1 ifand only if ( x, y ) is arc in the modified graph G ∗ . The aim is to minimize the number of editoperations, and thus, the symmetric difference between the respective arc sets. This is representedby the objective functionmin (cid:88) ( x,y ) ∈ V × V (1 − (cid:15) xy ) E xy + (cid:88) ( x,y ) ∈ V × V (1 − E xy ) (cid:15) xy . (1)Note, this objective function can also be used for the BMG completion and BMG deletion problem.To ensure that only arcs between vertices of distinct colors exist we add the constraints (cid:15) xy = 0 for all ( x, y ) ∈ V × V with σ ( x ) = σ ( y ) . (2)For the BMG completion problem, the arc set E must be contained in the modified arc set. Hence,we add E xy ≤ (cid:15) xy for all ( x, y ) ∈ V × V. (3)In this case, Equ. (3) ensures that (cid:15) xy = 1 if E xy = 1 and thus, ( x, y ) remains an arc in the modifiedgraph. In contrast, for the BMG deletion problem, it is not allowed to add arcs and thus, we use (cid:15) xy ≤ E xy for all ( x, y ) ∈ V × V. (4)In this case, Equ. (4) ensures that (cid:15) xy = 0 if E xy = 0 and thus, ( x, y ) does not become an arc in themodified graph. For the BMG editing problem, we neither need Constraint (3) nor (4). However,for all three problems we need the following.To obtain an (cid:96) -colored BMG ( G ∗ , σ ), we must ensure that each connected component C of( G ∗ , σ ) is an (cid:96) -colored graph and a BMG (cf. Lemma 2.11). In particular, therefore, by Obs. 2.5each vertex has at least one out-neighbor of of every other color. This property translates to theconstraint (cid:88) y (cid:54) = x (cid:15) ( x, y ) · ς y,s > or all s (cid:54) = σ ( x ). It automatically ensures that all connected components are (cid:96) -colored.By Thm. 2.9, a connected component ( G [ C ] , σ | C ) of ( G ∗ , σ ) is a BMG if and only if all induced 2-colored subgraphs in ( G [ C ] , σ | C ) are BMGs and the set R ( G, σ ) of informative triples is compatible.In order to ensure that all induced 2-colored subgraphs in ( G ∗ , σ ) are BMGs we use Thm. 3.4.Equ. (5) already guarantees that all 2-colored induced subgraphs are sink-free. Hence it sufficesto add constraints that exclude induced F1-, F2-, and F3-graphs. For every ordered four-tuple( x , x , y , y ) ∈ V with pairwise distinct x , x , y , y and σ ( x ) = σ ( x ) (cid:54) = σ ( y ) = σ ( y ), werequire (F1) (cid:15) x y + (cid:15) y x + (cid:15) y x + (1 − (cid:15) x y ) + (1 − (cid:15) y x ) < (cid:15) x y + (cid:15) y x + (cid:15) x y + (1 − (cid:15) x y ) < . (7)In addition, for every ordered five-tuple ( x , x , y , y , y ) ∈ V with pairwise distinct x , x , y , y , y and σ ( x ) = σ ( x ) (cid:54) = σ ( y ) = σ ( y ) = σ ( y ), we enforce(F3) (cid:15) x y + (cid:15) x y + (cid:15) x y + (cid:15) x y + (1 − (cid:15) x y ) + (1 − (cid:15) x y ) < . (8)The constraints so far ensure that all induced 2-colored subgraphs in ( G ∗ , σ ) are BMGs. Thisimplies that also all induced 2-colored subgraphs in every connected component ( G [ C ] , σ | C ) of( G ∗ , σ ) are BMGs since, otherwise, Thm. 2.9 applied on ( G [ C ] , σ | C ) would imply that ( G [ C ] , σ | C )is not a BMG and thus, by Lemma 2.11, ( G ∗ , σ ) would not be a BMG; a contradiction.Thm. 2.9 also requires that the set of informative triples R ( G, σ ) must be compatible. Toimplement this constraint, we follow the approach of [17]. Note, there is no distinction madebetween two triples ba | c and ab | c . In order to avoid superfluous variables and symmetry conditionsconnecting them, we assume that the first two indices in triple variables are ordered. Thus thereare three triple variables t ab | c , t ac | b and t bc | a for any three distinct a, b, c ∈ V and we add constraintssuch that t ab | c = 1 if ab | c is an informative triple (cf. Def. 2.8 and Lemma 2.10). Hence, we add (cid:15) xy + (1 − (cid:15) xy (cid:48) ) − t xy | y (cid:48) ≤ x, y, y (cid:48) ) ∈ V with three pairwisely distinct vertices x, y, y (cid:48) and σ ( x ) (cid:54) = σ ( y ) = σ ( y (cid:48) ).Equ. (9) ensures that if ( x, y ) is an arc ( (cid:15) xy = 1) and ( x, y (cid:48) ) is not an arc ( (cid:15) xy (cid:48) = 0) in the editedgraph, then t xy | y (cid:48) = 1. However, this constraint allows some degree of freedom for the choice ofthe binary value t xy | y (cid:48) , that is, we may put t xy | y (cid:48) = 1 also in case ( x, y ) is not an arc or ( x, y (cid:48) ) isan arc. However by Lemma 2.3, for every compatible set of triples R on V there is a strictly densecompatible set of triples R (cid:48) with R ⊆ R (cid:48) . We therefore add the constraint t ab | c + t ac | b + t bc | a = 1 for all { a, b, c } ∈ (cid:18) V (cid:19) (10)that ensures that precisely one of the binary variables representing one of the three possible tripleson three leaves is set to 1. The final set R (cid:48) of triples obtained in this manner contains all informativetriples but could be larger than R ( G, σ ). Moreover, the trees that display R (cid:48) need not necessarilyexplain the final BMG ( G ∗ , σ ). However, this is not needed, since we just want to ensure thatthe set of informative triples is compatible. To ensure compatibility of the triple set, we employThm. 1, Lemma 4 and ILP 5 from [17] that is based on so-called 2-order inference rules and add2 t ab | c + 2 t ad | b − t bd | c − t ad | c ≤ { a, b, c, d } ∈ (cid:18) V (cid:19) (11)The most expensive part of the constraint system are the O ( | V | ) conditions required to excludeinduced F3-graphs. We have shown here that arc modification problems for BMGs are NP-complete. This is not nec-essarily an obstacle for using BMG editing in practical workflows – after all, the computationalproblems in phylogenetics all involve several NP-complete steps, including
Multiple SequenceAlignment [8] and the
Maximum Parsimony Tree [13] or
Maximum Likelihood Tree prob-lems [5]. Nevertheless, highly efficient and accurate heuristics have been devised for these problems,often adjusted to the peculiarities of real-life data, so that the computational phylogenetics have ecome a routine task in bioinformatics. As starting point to tackling BMG editing in practice, weintroduced an ILP formulation, that should be workable at least for moderate-size instances.We note in passing that and can be shown to befixed-parameter tractable (with the number k of edits as parameter) provided that the input graphis sink-free. To see this, observe that sink-free 2-colored graphs are BMGs if and only if they donot contain induced F1-, F2-, and F3-subgraphs (cf. Thm. 3.4). The FPT result follows directlyfrom the observation that all such subgraphs are of fixed size and only a fixed number of arcdeletions (resp., additions) are possible. In the case of , only those arc deletionsare allowed that do not produce sinks in G . Clearly, graphs remain sink-free under arc addition.It remains unclear whether is also FPT for sink-free graphs. One difficulty isthat arc deletions may result in a sink-vertex that then need to be resolved by subsequent arcadditions. It also remains an open question for future research whether the BMG modificationproblems for (not necessarily sink-free) (cid:96) -colored graphs are also FPT. We suspect that this is notthe case for (cid:96) ≥
3, where the characterization also requires consistency of the set of informativetriples. Since removal of a triple from R ( G, σ ) requires the insertion or deletion of an arc, it seemsdifficult to narrow down the editing candidates to a constant-size set. Indeed,
Maximum TripleInconsistency is not FPT when parametrized by the number k of triples to be excluded [4]. Onthe other hand, the special case of Dense Maximum Triple Inconsistency is FPT [14]. Theset of informative triples R ( G, σ ), however, is usually far from being dense.For larger-scale practical applications we expect that heuristic algorithms will need to be devel-oped. An interesting starting point is the observation that in many examples some of the (non-)arcsin forbidden subgraphs cannot be modified. This phenomenon of unambiguously identifiable (non-)arcs will be the topic of ongoing work.
Acknowledgments
We thank Nicolas Wieseke for stimulating discussions. This work was funded in part by the GermanResearch Foundation (DFG). All forbidden subgraphs in 2-colored BMGs x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y x x y y F - g r a ph s x x y y F - g r a ph s G G G G G G G G G G G G G G G G Figure 8:
All F1-graphs and F2-graphs. Isomorphism classes are indicated by the boxes, and labeledaccording to Fig. 3. x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x y x y y x F2F2 F1F1F2F2F2F2F2F2F2F1F2 F1F2F1F2F2F2F2F2F1F2F2 F2F2 F1F2F1F2 F2F2F1 F1F2F2 F1F2F1F2F1F2F2F1F2 F1F2F2
F3-graphs G Figure 9:
All F3-graphs. Isomorphism classes are indicated by the boxes. Those graphs that containat least one F1- or F2-graph as an induced subgraph are marked with “F1”, resp. “F2”. eferences [1] Gene Abrams and Jessica K. Sklar. The graph menagerie: Abstract algebra and the madveterinarian. Math. Mag. , 83:168–179, 2010.[2] A.V. Aho, Y. Sagiv, T.G. Szymanski, and J.D. Ullman. Inferring a tree from lowest commonancestors with an application to the optimization of relational expressions.
SIAM J Comput ,10:405–421, 1981.[3] Adrian M Altenhoff, Brigitte Boeckmann, Salvador Capella-Gutierrez, Daniel A Dalquen,Todd DeLuca, Kristoffer Forslund, Jaime Huerta-Cepas, Benjamin Linard, C´ecile Pereira,Leszek P Pryszcz, Fabian Schreiber, Alan Sousa da Silva, Damian Szklarczyk, Cl´ement-MarieTrain, Peer Bork, Odile Lecompte, Christian von Mering, Ioannis Xenarios, Kimmen Sj¨olander,Lars Juhl Jensen, Maria J Martin, Matthieu Muffato, Quest for Orthologs consortium, ToniGabald´on, Suzanna E Lewis, Paul D Thomas, Erik Sonnhammer, and Christophe Dessimoz.Standardized benchmarking in the quest for orthologs.
Nature Methods , 13:425–430, 2016.[4] Jaroslaw Byrka, Sylvain Guillemot, and Jesper Jansson. New results on optimizing rootedtriplets consistency.
Discr. Appl. Math. , 158:1136–1147, 2010.[5] Benny Chor and Tamir Tuller. Finding a maximum likelihood tree is hard.
J. ACM , 53:722–744, 2006.[6] Henry Cohn, Robin Pemantle, and James G. Propp. Generating a random sink-free orientationin quadratic time.
Electr. J. Comb. , 9:R10, 2002.[7] E. S. El-Mallah and C. J. Colbourn. The complexity of some edge deletion problems.
IEEETrans. Circuits Syst. , 35:354–362, 1988.[8] Isaac Elias. Settling the intractability of multiple alignment.
J. Comput. Biol. , 13:1323–1339,2006.[9] W M Fitch. Distinguishing homologous from analogous proteins.
Syst Zool , 19:99–113, 1970.[10] Manuela Geiß, Edgar Ch´avez, Marcos Gonz´alez Laffitte, Alitzel L´opez S´anchez, B¨arbel M. R.Stadler, Dulce I. Valdivia, Marc Hellmuth, Maribel Hern´andez Rosales, and Peter F. Stadler.Best match graphs.
J. Math. Biol. , 78:2015–2057, 2019.[11] Manuela Geiß, Marcos E. Gonz´alez Laffitte, Alitzel L´opez S´anchez, Dulce I. Valdivia, MarcHellmuth, Maribel Hern´andez Rosales, and Peter F. Stadler. Best match graphs and reconcil-iation of gene trees with species trees.
J. Math. Biol. , 80:1459–1495, 2020.[12] Manuela Geiß, Peter F. Stadler, and Marc Hellmuth. Reciprocal best match graphs.
J. Math.Biol. , 80:865–953, 2020.[13] R. L. Graham and L. R. Foulds. Unlikelihood that minimal phylogenies for a realistic biologicalstudy can be constructed in reasonable computational time.
Math. Biosci. , 60:133–142, 1982.[14] Sylvain Guillemot and Matthias Mnich. Kernel and fast algorithm for dense triplet inconsis-tency.
Theor. Comp. Sci. , 494:134–143, 2013.[15] Marc Hellmuth, Manuela Geiß, and Peter F. Stadler. Complexity of modification problems forreciprocal best match graphs.
Theor. Comp. Sci. , 809:384–393, 2020.[16] Marc Hellmuth, Maribel Hernandez-Rosales, Katharina T. Huber, Vincent Moulton, Peter F.Stadler, and Nicolas Wieseke. Orthology relations, symbolic ultrametrics, and cographs.
J.Math. Biol. , 66:399–420, 2013.[17] Marc Hellmuth, Nicolas Wieseke, Marcus Lechner, Hans-Peter Lenhof, Martin Middendorf,and Peter F. Stadler. Phylogenomics with paralogs.
Proc Natl Acad Sci USA , 112:2058–2063,2015.[18] Richard M. Karp. Reducibility among combinatorial problems. In Raymond E. Miller,James W. Thatcher, and Jean D. Bohlinger, editors,
Complexity of Computer Computations:Proceedings of a symposium on the Complexity of Computer Computations , pages 85–103.Springer, Boston, MA, 1972.[19] Yunlong Liu, Jianxin Wang, Jiong Guo, and Jianer Chen. Complexity and parameterizedalgorithms for Cograph Editing.
Theor. Comp. Sci. , 461:45–54, 2012.[20] David Schaller, Manuela Geiß, Peter F. Stadler, and Marc Hellmuth. Complete characteriza-tion of incorrect orthology assignments in best match graphs. 2020. arXiv: 2006.02249.
21] Jo˜ao C. Setubal and Peter F. Stadler. Gene phylogenies and orthologous groups. In Jo˜ao C.Setubal, Peter F. Stadler, and Jens Stoye, editors,
Comparative Genomics , volume 1704, pages1–28. Springer, Heidelberg, 2018.[22] Erik Sonnhammer, Toni Gabald´on, Alan Wilter Sousa da Silva, Maria Martin, Marc Robinson-Rechavi, Brigitte Boeckmann, Paul Thomas, Christophe Dessimoz, and Quest for OrthologsConsortium. Big data and other challenges in the quest for orthologs.
Bioinformatics , 30:2993–2998, 2014.[23] Peter F. Stadler, Manuela Geiß, David Schaller, Alitzel L´opez S´anchez, Marcos Gonz´alez Laf-fitte, Dulce I. Valdivia, Marc Hellmuth, and Maribel Hern´andez Rosales. From pairs of mostsimilar sequences to phylogenetic best matches.
Alg. Mol. Biol. , 15:5, 2020.[24] Mihalis Yannakakis. Computing the Minimum Fill-In is NP-Complete.
SIAM J. AlgebraicDiscr. Methods , 2:77–79, 1981., 2:77–79, 1981.