Graph Motif Problems Parameterized by Dual
aa r X i v : . [ c s . CC ] A ug Graph Motif Problems Parameterized by Dual ∗ Guillaume Fertin and Christian Komusiewicz Laboratoire des Sciences du Numérique de Nantes, UMR CNRS 6004, Universitéde Nantes, 2 rue de la Houssinière, 44322 Nantes Cedex 3, France,[email protected] Fachbereich Mathematik und Informatik, Philipps-Universität Marburg, Germany,[email protected] 13, 2019
Abstract
Let G = ( V, E ) be a vertex-colored graph, where C is the set of colors used to color V . The Graph Motif (or GM ) problem takes as input G , a multiset M of colors built from C , andasks whether there is a subset S ⊆ V such that (i) G [ S ] is connected and (ii) the multiset ofcolors obtained from S equals M . The Colorful Graph Motif (or
CGM ) problem is thespecial case of GM in which M is a set, and the List-Colored Graph Motif (or
LGM )problem is the extension of GM in which each vertex v of V may choose its color from a list L ( v ) ⊆ C of colors.We study the three problems GM , CGM , and
LGM , parameterized by the dual parame-ter ℓ := | V | − | M | . For general graphs, we show that, assuming the strong exponential timehypothesis, CGM has no (2 − ǫ ) ℓ · | V | O (1) -time algorithm, which implies that a previous algo-rithm, running in O (2 ℓ · | E | ) time is optimal [Betzler et al., IEEE/ACM TCBB 2011]. We alsoprove that LGM is W[1]-hard with respect to ℓ even if we restrict ourselves to lists of at mosttwo colors. If we constrain the input graph to be a tree, then we show that GM can be solvedin O (3 ℓ · | V | ) time but admits no polynomial-size problem kernel, while CGM can be solved in O ( √ ℓ + | V | ) time and admits a polynomial-size problem kernel. The
Subgraph Isomorphism problem is the following pattern matching problem in graphs: givena (typically large) host graph G and a (small) query graph H , return one (or all) occurrence(s) of H in G , where the term occurrence denotes here a subset S of V ( G ) such that G [ S ], the subgraph of G induced by S , is isomorphic to H . This type of graph mining problem has different applications,notably in biology [25]. Subgraph Isomorphism is a structural graph pattern matching problem, ∗ A preliminary version of this work appeared in
Proceedings of the 27th Annual Symposium on CombinatorialPattern Matching (CPM ’16) , volume 54 of LIPIcs, pages 7:1–7:12. This version contains all missing proofs, animproved running time for Theorem 4 and a new tractability result (Theorem 5). CK was partially supported by theDFG, project “Multivariate algorithmics for graph and string problems in bioinformatics” (KO 3669/4-1). H and G . In some biological contexts, how-ever, additional information is provided to the vertices of the graphs, for example their biologicalfunction. This can be modeled by labeling each vertex of the graph, for example by giving it oneor several colors, each corresponding to an identified function. In the presence of such functionalannotation, the structure of a given induced subgraph may be of less importance than the functionsit corresponds to. Thus, a new set of functional graph pattern matching problems has emerged,starting with the Graph Motif problem [20], which was introduced in the context of the analysisof metabolic networks. In
Graph Motif , the query is a multiset M of colors that represents thefunctions of interest, and we search for an occurrence of M in the host graph, where the previousdemand of being isomorphic to the query is replaced by a connectivity demand. Graph Motif ( GM ) Input:
A multiset M built on a set C of colors, an undirected graph G = ( V, E ), anda coloring χ : V → C . Question:
Is there a set S ⊆ V such that G [ S ] is connected and there is a one-to-onemapping f : S → M such that f ( v ) = χ ( v ) for all v ∈ S ?Many variants of GM have been introduced and studied. In particular, List-Colored GraphMotif (or
LGM ) is a generalization of GM that is used to identify, in a given protein interactionnetwork, protein complexes that are similar to a given protein complex from a different species [7]. In LGM , the graph G is associated with a list-coloring L : V → C , that is, each vertex v is associatedwith a set L ( v ) of colors, and the question is whether there is a set S ⊆ V such that (i) G [ S ] isconnected and (ii) the one-to-one mapping f from S to M we look for satisfies ∀ v ∈ S : f ( v ) ∈ L ( v ).The special case of GM in which M is a set is called Colorful Graph Motif (or
CGM ). Manyoptimization problems related to GM have received interest, including some that are related totandem mass spectrometry and where the input graph is directed and edge-weighted [24]. All theseproblem variants have given rise to a very abundant literature. CGM , GM , and LGM are NP-hardeven in very restricted cases [20, 12, 6]. Consequently, many of the above-mentioned studies havefocused on (dis)proving fixed-parameter tractability of the problems (see e.g. [26] for an informalsurvey on the topic). In such cases, very often the parameter k := | M | = | S | is considered.In this paper, we study the parameterized complexity of GM , CGM , and
LGM , but we differfrom the usual viewpoint by focusing on the dual parameter ℓ := | V | − | S | , that is, ℓ is thenumber of vertices to be deleted from G to obtain a solution. Although the choice of ℓ may bedisputable because a priori it may be too large to expect a good behavior in practice, there areseveral arguments for choosing such a parameter: First, after some initial data reduction, the inputmay be divided into smaller connected components, where ℓ is not much larger than k . Second,the algorithms for parameter k rely on algebraic techniques or dynamic programming, and in bothcases, the worst-case running time is equivalent to the actual running time. In contrast, for examplefor CGM , the algorithm for parameter ℓ is a search tree algorithm [2], and search tree algorithmscan be substantially accelerated via pruning rules. Finally, there are subgraph mining problemswhere the dual parameter ℓ is usually bigger than the parameter k but leads to the current-bestalgorithm (in terms of performance on real-world instances), see e.g. [18]. Hence, parameterizationby ℓ may be useful even if ℓ is bigger than k , and thus deserves to be studied. Related work and our contribution. GM is NP-hard, even when M is composed of twocolors [12]. Concerning the parameterized complexity for parameter k := | M | , the current-bestrandomized algorithm has a running time of 2 k · n O (1) [3, 23] where n := | V | , and there is evidence2able 1: Overview of new and previous results with respect to the dual parameter ℓ := n − k ,where n := | V | , k := | M | , m := | E | and ∆ := max v ∈ V |L ( v ) | denotes the maximum list size in G .The lower bound result for CGM assumes the strong exponential time hypothesis (SETH) [19].General graphs TreesLGM W[1]-hard [2] ?LGM, ∆ = 2 W[1]-hard (Cor. 1) ?GM W[1]-hard [2] O (3 ℓ · n ) (Thm. 4)no poly. kernel (Thm. 6)CGM O (2 ℓ · m ) [2], O ( √ ℓ + n ) (Thm. 8),no (2 − ǫ ) ℓ · n O (1) (Thm. 1)no poly. kernel (Thm. 2) (2 ℓ + 1)-vertex kernel (Thm. 7)that this cannot be improved to a running time of (2 − ǫ ) k · n O (1) [3]. The current-best runningtime for a deterministic algorithm is 5 . k · n O (1) [22]. GM on trees can be solved in n O ( c ) timewhere c is the number of colors in M [12], but is W[1]-hard with respect to c [12]. Other param-eters, essentially related to the structure of the input graph G , have been studied by Ganian [17],Bonnet and Sikora [6], and Das et al. [9]. For example, Graph Motif is fixed-parameter tractablewhen parameterized by the size of a vertex cover of the input graph [17, 6]. Finally, concerningparameter ℓ , GM has been shown to be W[1]-hard, even when M is composed of two colors [2].Since CGM is a special case of GM , any above-mentioned positive result for GM also holds for CGM . In addition,
CGM is NP-hard even for trees of maximum degree 3 [12], and does not admita polynomial-size problem kernel with respect to k even if G has diameter two or if G is a combgraph (a special type of tree with maximum degree 3) [1]. Finally, CGM can be solved in O (2 ℓ · m )time [2], where m := | E | . The LGM problem is an extension of GM and thus any negative resultfor GM propagates to LGM . Moreover,
LGM is fixed-parameter tractable with respect to k , andthe current-best algorithm runs in 2 k · n O (1) time [23]. Concerning parameter ℓ , LGM has beenshown to be W[1]-hard even when M is a set [2].As mentioned above, we study GM , LGM and
CGM with respect to the dual parameter ℓ := n − k . Since many results in general graphs turn out to be negative, we also study the specialcase where the input graph G is a tree. Our results are summarized in Table 1. In a nutshell, westrengthen previous hardness results for the general case and show that the O (2 ℓ · m )-time algorithmfor CGM is essentially optimal. Then, we show that for GM on trees and for some special casesof LGM on trees, a fixed-parameter algorithm can be achieved. Finally, we show that for
CGM on trees, a polynomial-size problem kernel and better running times than for general graphs can beachieved.
Preliminaries.
For an integer n , we use [ n ] := { , . . . , n } to denote the set of the integersfrom 1 through n . Throughout the paper, the input graph for our three problems is G = ( V, E ),and we let n := | V | (resp. m := | E | ) denote its number of vertices (resp. edges). For a vertexset S ⊆ V , we let G [ S ] := ( S, {{ u, v } | u, v ∈ S } ) denote the subgraph induced by S . The set S of vertices sought for in the three problems is called an occurrence of M . If G is vertex-colored,we call a vertex set S colorful if | S | = | M | and all vertices in S have pairwise different colors. A3ertex v is called unique if it is assigned a color c that is assigned to no other vertex in V . For amultiset M and an element c of M , we use M ( c ) to denote the multiplicity of c in M .To analyze the structure of the coloring constraints for instances of LGM , we consider thefollowing auxiliary graph.
Definition 1.
Let ( M, G, L ) be an instance of LGM . The vertex-color graph H of ( M, G, L ) is thebipartite graph with vertex set V ∪ C and edge set {{ v, c } | v ∈ V, c ∈ L ( v ) } . Observe that GM instances are LGM instances where in the vertex-color graph H each vertexfrom V has degree one. In other words, H is a disjoint union of stars whose non-leaf is a vertexfrom C . Moreover, an LGM instance where H is a disjoint union of bicliques can be easily replacedby an equivalent GM instance: For each biclique K in H , replace the color set K ∩ C by one colorwith multiplicity P c ∈ C M ( c ) in M and assign this color to all vertices in K ∩ V .We briefly recall the relevant notions of parameterized algorithmics [8, 11]. Parameterizedalgorithmics aims at analyzing the impact of structural input properties on the difficulty of com-putational problems. Formally, a parameterized problem L is a subset of Σ ∗ × N where the firstcomponent is the input instance and the second component is the parameter . A parameterizedproblem L is fixed-parameter tractable if every input instance ( I, k ) can be solved in f ( k ) · | I | O (1) time where f is a computable function depending only on k . A reduction to a problem kernel , or kernelization , is an algorithm that takes as input an instance ( I, k ) of a parameterized problem andproduces in polynomial time an instance ( I ′ , k ′ ) such that • ( I, k ) is a yes-instance if and only if ( I ′ , k ′ ) is a yes-instance and • | I ′ | ≤ g ( k ) where g is a computable function depending only on k .The instance ( I ′ , k ′ ) is called problem kernel and g is called the size of the problem kernel . If g is a polynomial function, then the problem admits a polynomial-size problem kernelization . Theclass W[1] is a basic class of presumed fixed-parameter intractability [8, 11], that is, if a problemis W[1]-hard for parameter k , then we assume that it cannot be solved in f ( k ) · n O (1) time [8, 11].The strong exponential time hypothesis (SETH) assumes that, for any ǫ >
0, CNF-SAT cannot besolved in time (2 − ǫ ) n · | Φ | O (1) where Φ is the input formula and n is the number of variables [19].This work is structured as follows. In Section 2, we present lower bounds for LGM and
CGM on general graphs. These negative results motivate our study of the case when G is a tree; ourresults for GM on trees and CGM on trees will be presented in Section 3 and Section 4, respectively.We conclude with an outlook of future work in Section 5.
CGM can be solved in O (2 ℓ · m ) time [2]. We show that this running time bound is essentiallyoptimal. Theorem 1.
Colorful Graph Motif cannot be solved in (2 − ǫ ) ℓ · n O (1) time unless the strongexponential time hypothesis (SETH) fails.Proof. We present a polynomial-time reduction from CNF-SAT:4 nput:
A boolean formula Φ in conjunctive normal form with clauses C , . . . , C q overvariable set X = { x , . . . , x r } . Question:
Is there an assignment β : X → { true , false } that satisfies Φ?The reduction works as follows. First, for each variable x i ∈ X introduce two variable vertices v ti and v fi and color each of the two vertices with color χ Xi . The idea is that (with the final occurrence)we must select exactly one vertex for this color. This selection will correspond to a truth assignmentto X . Now, introduce for each clause C i a clause vertex u i , color u i with a unique color χ C i andmake u i adjacent to vertex v tj if x j occurs nonnegated in C i and to vertex v fj if x j occurs negatedin C i . Finally, introduce one further vertex v ∗ with a unique color χ ∗ , make v ∗ adjacent to allvariable vertices and let M be the set containing each of the introduced colors exactly once. Notethat there are exactly | X | colors that appear twice in G and that all other colors appear exactlyonce. Hence, ℓ = | X | . We next show the correctness of the reduction. Let I = ( M, G, χ ) denotethe constructed instance of
CGM .First, assume that Φ is satisfiable and let β be a satisfying assignment of X . For the CGM instance I consider the vertex set S ⊆ V that contains all clause vertices, vertex v ∗ , and for eachvariable x i the vertex v ti if β sets x i to true and the vertex v fi otherwise. Clearly, | S | = | M | andno two vertices of S have the same color. To show that I is a yes-instance of CGM it remains toshow that G [ S ] is connected. First, the subgraph induced by the variable vertices in S plus v ∗ is astar and thus it is connected. Second, since β is a satisfying assignment each clause vertex in S hasat least one neighbor in S (which is by construction a variable vertex). Hence, G [ S ] is connected.Conversely, assume that I is a yes-instance of CGM , and let S be a colorful vertex set with | S | = | M | such that G [ S ] is connected. Since S is colorful, the variable vertices in S correspond to a truthassignment of X . This assignment satisfies X : Indeed, since G [ S ] is connected, there is a pathin G [ S ] between each clause vertex u i and v ∗ , and thus there is a neighbor of u i that is in S . If thisneighbor is v tj (resp. v fj ), then by construction, β assigns true (resp. false ) to x j and thus C i issatisfied.Thus, the two instances are equivalent. Now observe that since ℓ = | X | = r and n = 2 r + q + 1,any (2 − ǫ ) ℓ · n O (1) -time algorithm implies a (2 − ǫ ) r · ( r + q ) O (1) -time algorithm for CNF-SAT. Thisdirectly contradicts the SETH.The above reduction also makes the existence of a polynomial-size problem kernel for parameter ℓ unlikely. This is implied by the following two facts. First, CNF-SAT parameterized by the numberof variables does not admit a polynomial-size problem kernel unless NP ⊆ coNP/poly [10]. Second,the reduction presented in the proof of Theorem 1 is a polynomial parameter transformation [5]from CNF-SAT parameterized by the number of variables to CGM parameterized by ℓ . Moreprecisely, given an input CNF-SAT formula Φ on variable set X , the reduction produces an instance I = ( M, G, χ ) of
CGM with ℓ = | X | . Now, any polynomial-size problem kernelization applied to I produces in polynomial time an equivalent CGM instance I ′ of size ℓ O (1) = | X | O (1) . Since CNF-SAT is NP-hard, we can now transform this CGM instance in polynomial time into an equivalentCNF-SAT instance that has size ℓ O (1) = | X | O (1) . Hence, a polynomial-size problem kernel for CGM parameterized by ℓ implies a polynomial-size problem kernel for CNF-SAT parameterizedby | X | . This implies NP ⊆ coNP/poly [10] (which in turn implies a collapse of the polynomialhierarchy). Theorem 2.
Colorful Graph Motif parameterized by ℓ does not admit a polynomial-sizeproblem kernel unless NP ⊆ coNP/poly .
5e have thus resolved the parameterized complexity of
CGM parameterized by ℓ on generalgraphs and now turn to the more general LGM problem, which is W[1]-hard with respect to ℓ [2].Here, it would be desirable to obtain fixed-parameter algorithms for parameter ℓ at least for somerestricted inputs. In other words, we would like to further exploit the structure of real-worldinstances to obtain tractability results. A very natural approach here is to consider the size andstructure of the list-colorings L ( v ) as additional parameter. Unfortunately, the problem remainsW[1]-hard even for the following very restricted case of list-colorings. Recall, that the vertex-colorgraph is the bipartite graph with vertex set V ∪ C in which v ∈ V and c ∈ C are adjacent if andonly if c ∈ L ( v ). Theorem 3.
List-Colored Graph Motif is W[1]-hard with respect to ℓ even if the vertex-colorgraph is a disjoint union of paths.Proof. We reduce from the
Multicolored Independent Set problem:
Input:
An undirected graph H = ( W, F ) and a vertex-labeling λ : W → { , . . . , k } . Question:
Is there a set S ⊆ W such that | S | = k , the vertices in S have pairwisedifferent labels and H [ S ] has no edges? Multicolored Independent Set has been shown to be W[1]-hard when parameterizedby k [13]. We call the colors of the Multicolored Independent Set labels to avoid confusionwith the colors of the
List-Colored Graph Motif instance. Assume without loss of generalitythat each label class in H contains the same number x of vertices (this can be achieved by paddingsmaller classes with additional vertices) and that there is an arbitrary but fixed ordering of thevertices of H .The reduction works as follows. We first describe the input graph G for LGM . We let V = V ∪ V ∪ { v ∗ } , where V = W and V = { v e | e ∈ F ( H ) } . Now construct the edge set E of G asfollows. First, make vertex v ∗ adjacent to all vertices of V . Then, for each edge { u, w } of H makevertex v { u,w } adjacent to u and w . This completes the construction of G . Now let us describe thecoloring of the vertices. We start with the colors given to V = W . For each label i from λ dothe following: create x − c i , . . . , c ix − . Now, with respect to the above-mentioned ordering,color the first vertex of label class i with color c i , color any j th vertex, 2 ≤ j ≤ x −
1, with the list { c ij − , c ij } , and finally color the x th vertex with color c ix − . Next, color each vertex from V ∪ { v ∗ } with a unique color. Let L denote the list-coloring of V ( G ) that we just described. We define themotif M as the set containing each color present in L . Clearly, the reduction works in polynomialtime. Note that | V | = kx + | E | + 1 and | M | = k ( x −
1) + | E | + 1 and thus ℓ = | V | − | M | = k . Toprove our claim, it thus remains to show the correctness of the reduction.( H, k ) is a yes-instance of
Multicolored Independent Set ⇔ ( M, G, L ) is a yes-instance of LGM .( ⇒ ) Let S be a size- k independent set with pairwise different vertex labels in H . Consider theset Y := V \ S in G . First, note that G [ Y ] is connected: vertex v ∗ is adjacent to all vertices in Y ∩ V and each vertex v { u,w } of Y ∩ V = V has at least one neighbor in Y ∩ V , because at most one ofthe endpoints of { u, w } is in the independent set S .It remains to show that we can assign colors to the vertices such that the union of the vertexcolors is M . All vertices with unique colors are contained in Y and their coloring is clear. All othervertices are in V . Now consider label class i of V . Exactly one vertex u of label class i is contained6n S . Let j be the number such that u is the j th vertex of the label class i . Then, color the q thvertex of label class i with color c iq if q < j and with color c iq − if q > j . Clearly this coloringassigns x − Y that is equal to M .( ⇐ ) Let Y denote an occurrence of M in G . First, observe that there are only x − x vertices of each label class. Hence, Y contains exactly x − S denote the set containing, for each label class, the only vertex not contained in Y . Clearly, | S | = k and the elements of S have pairwise different labels in H . Furthermore, S is an independentset in H : since G [ Y ] is connected, there is for each edge vertex v { u,w } at least one of its neighborsin Y . Hence, at most one of the endpoints of each edge { u, w } is in S .We immediately obtain the following. Corollary 1.
List-Colored Graph Motif is W[1]-hard with respect to ℓ even if |L ( v ) | ≤ forevery vertex v in G . Motivated by these negative results on general graphs, we now study the special case where theinput graph is a tree. For
LGM , we were not able to resolve the parameterized complexity withrespect to ℓ for this case. Hence, we focus on the more restricted GM problem. We show that GM is fixed-parameter tractable with respect to ℓ if the input graph is a tree. Recall that for generalgraphs, GM is W[1]-hard for ℓ even if the motif M contains only two colors [2]. Hence, our resultshows that the tree structure significantly helps when parameterizing by ℓ . We then show that thefixed-parameter algorithm for GM on trees extends to some special cases of LGM in which thevertex-color graph is also a tree. Finally, we show that a polynomial-size kernel for GM on treesparameterized by ℓ is unlikely. Call a color of M abundant if it occurs more often in G than in M . The abundant colors are exactlythe ones that have to be “deleted” to obtain a solution S . Let c , . . . , c j denote the abundant colorsof M , and let ℓ i denote the difference between the number of vertices in V that have color c i andthe multiplicity M ( c i ) of c i in M . This implies in particular that P ≤ i ≤ j ℓ i = ℓ .The algorithm is a dynamic programming algorithm that works on a rooted representation T of G . We obtain T by choosing an arbitrary vertex r ∈ V and rooting G at r . As usual fordynamic programming on trees, the idea is to combine partial solutions of subtrees. Our algorithmis somewhat similar to a previous dynamic programming algorithm for GM on graphs of boundedtreewidth [12] but the analysis and concrete table setup is different.In the following, let T v denote the subtree of T rooted at vertex v . For each subtree, welet occ( T v , c ) denote the number of vertices in T v that have color c . If a solution contains verticesfrom T v and further vertices, then it must contain v and all vertices with nonabundant colors in T v .Hence, in the dynamic programming it is sufficient to consider subtrees described in the followingdefinition. Definition 2.
We call a connected subtree T ′ of T v safe if T ′ contains v and if every vertex of T v that is colored by a nonabundant color is contained in T ′ .
7e fill a family of dynamic programming tables D v , one table for each v ∈ V . The entries of D v are defined as follows: D v [ λ , . . . , λ j ] = T v has a safe subtree containing for each c i , 1 ≤ i ≤ j ,exactly occ( T v , c i ) − λ i vertices of color c i , . Assume for now that the table has completely been filled out. Then, it can be easily determinedwhether G has an occurrence S of M .If S is an occurrence of M , then let v denote the root of T [ S ]. Clearly, T [ S ] is a safe subtreeof T v . Moreover, every vertex with a nonabundant color is contained in T [ S ] and for all vertices withan abundant color c i , the tree T [ S ] contains occ( T v , c i ) − λ i vertices with color c i for some λ i ≥ D v [ λ , . . . , λ j ] whose value is 1 and where occ( T v , c i ) − λ i is themultiplicity of c i for each c i .Conversely, if there is some entry D v [ λ , . . . , λ j ] with value 1 such that T v contains all verticeswith nonabundant colors and for each c i , 1 ≤ i ≤ j , occ( T v , c i ) − λ i is exactly the multiplicity of c i in M , then there is at least one safe subtree of T v whose vertex set is an occurrence of M .Hence, one may solve GM by filling table D , and then checking for each vertex v whether one ofthe entries of D v [ λ , . . . , λ j ] with value 1 implies the existence of a solution. For the running timebound, the main observation that we exploit is that if a safe rooted subtree of T v contains all thevertices of T v that are in a solution S , then it contains at least occ( T v , c i ) − ℓ i vertices with color c i .Consequently, the relevant range of values for λ i is in [0 , ℓ i ] and thus bounded in the parametervalue ℓ .We now describe how to fill in table D . To initialize D , consider each leaf v of the tree T . Bythe definition of D , an entry can have value 1 only if there is a corresponding safe tree which needsto contain v . Thus, D v [ λ , . . . , λ j ] = 1 ⇔ λ = . . . = λ j = 0 . Now, to compute the entries of D for a nonleaf vertex v , we combine the entries of the childrenof v . To this end, fix an arbitrary ordering of the children of v and denote them by u , . . . , u deg( v ) .Now, let T iv denote the subtree rooted at v containing the vertices of each T u q , 1 ≤ q ≤ i , and novertices from each T u q , q > i . For increasing i , we compute solutions for T iv , eventually computingthe solutions for T deg( v ) v = T v . To compute these solutions, we define an auxiliary table D iv . Thetable entries are defined just as for D , that is, D iv [ λ , . . . , λ j ] = T iv has a safe subtree containing for each c i , 1 ≤ i ≤ j ,exactly occ( T v , c i ) − λ i vertices of color c i , . Observe that, since T deg( v ) v = T v , we have D deg( v ) v = D v and thus by computing D deg( v ) v we alsocompute D v . Now, D v can be computed in a straightforward fashion from the entries of D u . D v [ λ , . . . , λ j ] = D u [ λ , . . . , λ j ] = 11 if occ( T u , c i ) = λ i and T u contains only abundant colors , . T ′ of T iv contains at least one vertexof T u , the second case corresponds to the case that T ′ contains only v .To compute D iv for i >
1, we combine entries of D i − v with D u i . D iv [ λ , . . . , λ j ] = T u i contains only abundant colors and D i − v [ λ − occ( T u i , c ) , . . . , λ j − occ( T u i , c )] = 1 , λ ′ , . . . , λ ′ j ) such that D i − v [ λ ′ , . . . , λ ′ j ] = D u i [ λ − λ ′ , . . . , λ j − λ ′ j ] = 1 , T u i is part of the safe subtree, inthe second case the safe subtree contains some vertices of T u i and some vertices of T i − v .This completes the description of the dynamic programming recurrences. The correctness followsfrom the fact that the recurrence considers all possible cases to “distribute” the vertex deletions. Itremains to bound the running time. Theorem 4.
Graph Motif can be solved in O (3 ℓ · n ) time if G is a tree.Proof. As described above, the relevant values of each λ i are in [0 , ℓ i ]. Thus, for each subtable D iv and D v , the number of entries to compute is Q ≤ i ≤ j ( ℓ i + 1). The dominating term in the overallrunning time is the computation of the second term in the recurrence for D iv [ λ , . . . , λ j ] where weconsider all ( λ ′ , . . . , λ ′ j ) such that D i − v [ λ ′ , . . . , λ ′ j ] = D u i [ λ − λ ′ , . . . , λ j − λ ′ j ] = 1. The number ofpossible choices can be computed as follows. For each λ i , the values of λ ′ i range between 0 and λ i .Overall this gives ℓ i X j =0 j + 1 = ℓ i +1 X j =1 j = ( ℓ i + 2) · ( ℓ i + 1) / λ i and λ ′ i . We now bound this product in ℓ . Thus, we aim to find thevector ( ℓ , . . . , ℓ j ) that maximizes Q ≤ i ≤ j ( ℓ i + 2) · ( ℓ i + 1) / P ≤ i ≤ j ℓ i = ℓ .We claim that this is the vector with ℓ = . . . ℓ j = 1.To this end, consider a vector ( ℓ , . . . , ℓ j ) whose maximum entry is at least 2. Without loss ofgenerality, assume thus ℓ j >
1. Now consider ( ℓ , . . . , ℓ j − ,
1) and observe that ( P ≤ i 1) + 1 = ℓ , that is, the new vector also satisfies the summation constraint. Moreover, Q ≤ i ≤ j (( ℓ i + 2) · ( ℓ i + 1) / Q ≤ i 1. Since the new vector has more entries with value 1, weconclude that the maximum value is reached when all entries assume value 1. Consequently, theworst case number of recurrences that need to be evaluated for filling a subtable D iv or D v is O (3 ℓ ).The overall number of subtables to fill is O ( P v ∈ V deg( v )) = O ( n ). This implies the overall runningtime bound. The fixed-parameter tractability of GM on trees can be extended to give fixed-parameter tractabilityfor LGM when the input graph G is a tree and the vertex-color graph H is a forest with boundeddegree. 9he first step in our algorithm is to apply the following two data reduction rules which areobviously correct. Rule 1. If there is a color vertex v in H such that the degree of v in H is smaller than themultiplicity of v in M , then return “no”. Rule 2. If there is a color vertex v in H such that the degree of v in H equals the multiplicity of v in M , then set L ( u ) = { v } for all neighbors u of v in H . With these reduction rules at hand, we can show that the following special case of LGM isfixed-parameter tractable with respect to ℓ . Lemma 1. LGM can be solved in O (3 ℓ · n ) time if G is a tree and the vertex-color graph H is a forest in which for every color vertex c , the difference between the degree of c in H and themultiplicity of c in M is at most one.Proof. We describe a reduction of this special case of LGM on trees to GM on trees. In thefollowing, we assume without loss of generality that every color in the instance has multiplicity atleast one in M . First, apply Reduction Rules 1 and 2 exhaustively. This can be done in O ( n ) timeby computing the difference between M ( c ) and deg H ( c ) once for all c ∈ C and updating this valuewhenever we delete an edge. Afterwards, for every color vertex c in M , we have M ( c ) ≤ deg H ( c ),due to Reduction Rule 1, and M ( c ) ≥ deg H ( c ) − M ( c ) = deg H ( c ) in M , then the connected component of c in H consistsof c and deg H ( c ) leaf neighbors of c . By the above, the vertex-color graph H contains the two typesof connected components considered below. For both of them, we show that the constraints of L can be replaced by simple coloring constraints. Consider a connected component H ′ of H . Case 1: H ′ is a star with a central color vertex c such that M ( c ) = deg H ( c ) in M . Replace c by deg H ( c ) new colors and assign a different color to each neighbor of c in H . This is equivalent asall neighbors of c in H are contained in any occurrence of M . Case 2: H ′ is a tree in which each color vertex c fulfills M ( c ) = deg H ( c ) − . Let C ′ denote theset of color vertices in H ′ and V ′ denote the set of vertices of H ′ that do not correspond to colors.Replace C ′ by one new color c ∗ and set the multiplicity of c ∗ to | V ′ | − 1. To show correctness ofthis replacement, we show that for every v ∈ V ′ , there is an assignment f ′ : V ′ \ { v } → C ′ suchthat f ′ ( u ) ∈ L ( u ) for each u ∈ V ′ \ { v } and each color c ∈ C ′ is assigned exactly deg H ( c ) − f ′ . To see the existence of this assignment consider the version of H ′ that is rooted at v . Foreach color vertex c in H ′ , let V ′ c denote the children of c in this rooted tree. For each vertex u ∈ V ′ c set f ′ ( u ) := c . With this assignment, there are exactly deg H ( c ) − c and every vertex u ∈ V ′ \ { v ′ } is assigned to some color c of C ′ , namely to its predecessor in H ′ .Applying these replacements exhaustively then results in an equivalent instance of GM on treeswhich can be solved in the claimed running time due to Theorem 4.We now show how to use the running time bound of Lemma 1 to obtain a fixed-parameteralgorithm for the dual parameter ℓ for the special case of LGM when the color-vertex graph isa tree and each color has a bounded multiplicity in M . Thus, let M ( C ) := max c ∈ C M ( c ) denotethe largest multiplicity in M . We will achieve the algorithm by branching on colors c where thedifference between M ( c ) and deg H ( c ) is at least two. We call such a color vertex in thefollowing. The first step of the algorithm is to apply Reduction Rules 1 and 2 exhaustively. Branching Rule 1. If the vertex-color graph H contains a connected component H ′ with at leastone 2-abundant color vertex, then do the following. Root H ′ arbitrarily. • Choose some 2-abundant vertex c of H ′ such that the subtree of H ′ rooted at c has no further2-abundant vertex. • Choose a set V c of M ( c ) + 1 arbitrary children of c . • For each u ∈ V c branch into the case that c is removed from L ( u ) .Proof of correctness. First, observe that such a 2-abundant color vertex c always exists and thatit can be found in linear time by a bottom-up traversal of the rooted tree. Second, observe thatsince M ( c ) ≥ deg H ( c )+2, the vertex c has at least M ( c )+1 children. Hence, some child u of c in H ′ does not receive the color c in any occurrence of M . Thus, if the original instance is a yes-instance,then the branch in which we remove c from L ( u ) is a yes-instance. Conversely, any occurrence of M in an instance created by the branching rule is an occurrence of M in the original instance.If Branching Rule 1 does not apply, then we can solve the instance in O (3 ℓ · n ) time by Lemma 1.It thus remains to ensure that the rule cannot be applied too often. To this end, we apply onefurther reduction rule. To formulate the rule we need the following definition. We call a connectedcomponent H ′ of the vertex-color graph H costly if H ′ either consists of just one vertex v ∈ V or H ′ is a tree such that all color vertices c in H ′ have multiplicity exactly deg H ( c ) − M . Rule 3. If G contains at least ℓ + 1 costly components, then return “no”.Proof of correctness. For each costly component at least one vertex is not contained in any occur-rence of M . This is obvious for those components consisting only of one vertex v from V . For theother costly components, this follows from Case 2 in the proof of Lemma 1.Now it remains to observe that in each instance created by an application of Branching Rule 1,the number of costly components is increased by exactly one. Hence, after at most ℓ + 1 branchingsteps, Reduction Rule 3 directly reports that we have a “no” instance. Since we branch into M ( c ) +1 ≤ M C +1 cases in each application of Branching Rule 1, we thus create O (( M C +1) ℓ +1 )) instancesthat either adhere to the conditions of Lemma 1 or are rejected due to Reduction Rules 1 or 3 andcan thus be solved in O (3 ℓ · n ) time. Altogether, we obtain the following running time. Theorem 5. If G is a tree, and the color vertex graph H is a forest, then LGM can be solved in O (( M C + 1) ( ℓ +1) · ℓ · n ) time. When M is a set, the largest multiplicity is one, giving the following running time. Corollary 2. If G is a tree, H is a forest, and M is a set, then LGM can be solved in O (6 ℓ · n ) time. By observing that Branching Rule 1 branches into at most deg H ( c ) − H . Corollary 3. If G is a tree, and H is a tree whose color vertices have degree at most ∆ C , then LGM can be solved in O ((∆ C − ( ℓ +1) · ℓ · n ) time. .3 A Kernelization Lower Bound We now show that GM does not admit a polynomial-size problem kernel with respect to ℓ , evenif G is a tree. The proof is based on a cross-composition [4] from the Multicolored Clique problem. Multicolored Clique Input: An undirected graph H = ( W, F ) and a vertex-labeling λ : W → { , . . . , k } . Question: Is there a vertex set S ⊆ W such that | S | = k , the vertices in S havepairwise different labels and H [ S ] is a clique? Multicolored Clique has been shown to be W[1]-hard parameterized by k [13]. We refer tothe colors of the Multicolored Clique instance as labels to avoid confusion with the colors ofthe GM instance. Informally, cross-compositions are reductions that combine many instances ofone problem into one instance of another problem. The existence of a cross-composition from anNP-hard problem to a parameterized problem Q implies that Q does not admit a polynomial-sizeproblem kernel (unless NP ⊆ coNP/poly) [4]. Definition 3 ([4]) . Let L ⊆ Σ ∗ be a language, let R be a polynomial equivalence relation on Σ ∗ , andlet Q ⊆ Σ ∗ × N be a parameterized problem. An or-cross-composition of L into Q (with respect to R )is an algorithm that, given t instances x , x , . . . , x t ∈ Σ ∗ of L belonging to the same equivalenceclass of R , takes time polynomial in P ti =1 | x i | + k and outputs an instance ( y, k ) ∈ Σ ∗ × N of Q such that • the parameter value k is polynomially bounded in max ti =1 | x i | + log t , and • the instance ( y, k ) is a yes-instance for Q if and only if at least one instance x i is a yes-instance for L . We present an or-cross composition of Multicolored Clique into GM on trees parameterizedby ℓ . The polynomial equivalence relation R will be simply to assume that all the MulticoloredClique instances have the same number of vertices n . The main trick is to encode vertex identitiesin the graph of the Multicolored Clique instance by numbers of colored vertices in the GM instance; this approach was also followed in previous works on GM [12, 6].Given t instances ( H = ( W , F ) , λ ) , H = ( W , F ) , λ ) , . . . , H t = ( W t , F t ) , λ t ) of Multi-colored Clique such that | W i | = n for all i ∈ [ t ], we reduce to an instance of GM where theinput graph is a tree as follows. Herein, we assume without loss of generality that t = 2 s for someinteger s .The first construction step is to add one vertex r that connects the different parts of the instanceand which will be contained in every occurrence of the motif. The vertex r thus receives a uniquecolor that may not be deleted. To this vertex r we attach subtrees corresponding to edges of theinput instances. Deleting vertices of such a subtree then corresponds to selecting the endpoints ofthe corresponding edge. Instance selection gadget. The technical difficulty in the construction is to ensure thatthe solution of GM deletes only vertices in subtrees corresponding to edges of the same graph. Toachieve this, we introduce k · ( k − · log t instance selection colors ι [ p, q, τ ] where p ∈ [ k ], q ∈ [ k ] \{ p } ,and τ ∈ [log t ], and demand that the solution deletes exactly one vertex of each instance selectioncolor. To ensure that exactly one instance is selected, we use two further colors ι + and ι − .12or each Multicolored Clique instance ( H i , λ i ), attach an instance selection path P i to r thatis constructed based on the number i . Let b ( i ) denote the binary expansion of i and let b τ ( i ), τ ∈ [log t ], denote the τ th digit of b ( i ). Construct a path P i containing first a vertex with color ι + , thenin arbitrary order exactly one vertex of each color in the color set I i := { ι [ p, q, τ ] : b τ ( i ) = 1 } , andthen a vertex with color ι − . Attach the path P i to r by making the vertex with color ι + a neighborof r .The idea of the construction is that exactly one instance selection path P i is completely deletedand that this will force any solution to delete paths that “complement” P i (that is, paths whichcontain all ι [ p, q, τ ] such that b τ ( i ) = 0) in the rest of the graph. Edge selection gadget. To force deletion of subtrees corresponding to exactly (cid:0) k (cid:1) edgeswith different labels, we introduce 2 k ( k − 1) label selection colors λ [ p, q ] + and λ [ p, q ] − where p ∈ [ k ]and q ∈ [ k ] \ { p } . These colors will ensure that, for each pair of labels p and q , the solutiondeletes exactly one path corresponding to the ordered pair ( p, q ) and one path corresponding to thepair ( q, p ).There are two further sets of colors. One set is used for ensuring vertex consistency of thechosen edges, that is, to make sure that all the selected edges with label pair ( p, · ) correspond tothe same vertex with label p . More precisely, we introduce a color ω [ p, q ] for each p ∈ [ k ] andeach q ∈ [ k ] \ { p } , except for the biggest q ∈ [ k ] \ { p } .The final color set is used to check that the edges selected for label pair ( p, q ) and for labelpair ( q, p ) are the same. To this end, we introduce a set of colors ε [ p, q ] for each p ∈ [ k ] andeach q ∈ [ k ] \ { p } such that q > p . To perform the checks of vertex and edge consistency, we encodethe identities of vertices and edges into path lengths. More precisely, we assign each vertex v ∈ W i a unique (with respect to the vertices of W i ) number v ) ∈ [ n ].Now, for each label pair ( p, q ) and each instance i , attach one path P i ( u, v ) to r for eachedge { u, v } where u has color p and v has color q = p . The path P i ( u, v ) • starts with a vertex with color λ [ p, q ] + that is made adjacent to r , • then contains exactly one vertex of each color in { ι [ p, q, τ ] : ι [ p, q, τ ] / ∈ I i } , • then contains u ) vertices of color ε [ p, q ] if p < q and n − v ) vertices of color ε [ q, p ] if p > q , • then, if q is not the biggest label in [ k ] \ p , contains u ) vertices with color ω [ p, q ], • then, if q is not the smallest label in [ k ] \ p , contains n − u ) vertices with color ω [ p, q ′ ],where q ′ is the next-smaller label in [ k ] \ p (if p = q − 1, then q ′ = q − 2; otherwise q ′ = q − • ends with a vertex with color λ [ p, q ] − .Let C denote the multiset containing all the vertex colors of all vertices added during theconstruction with their respective multiplicities. In the correctness proof it will be easier to argueabout the colors that are not contained in M . Hence, the construction is completed by setting themultiset D of colors to “delete” to contain each color exactly once except • the color of r which is not contained in D , • the vertex consistency colors ω [ p, q ] each of which is contained with multiplicity n , and13 the edge selection colors ε [ p, q ] each of which is contained with multiplicity n .The motif M is defined as M := C \ D . It remains to show the correctness. Theorem 6. Graph Motif does not admit a polynomial-size problem kernel with respect to ℓ even if G is a tree unless NP ⊆ coNP/poly .Proof. To complete the proof we need to show that the construction fulfills the properties of cross-compositions. First, the construction clearly runs in polynomial time. Second, the number ofintroduced colors is polynomial in k + log t and thus the value of ℓ = | D | is bounded polynomialin n + log t . Thus, it remains to show that the composition is an or-cross composition, that is:At least one ( H i , λ i ) is a yes-instance of Multicolored Clique ⇔ ( M, G, L ) is ayes-instance of GM .( ⇒ ) Let S ∈ W i be a vertex set of size k such that H i [ S ] is a clique and the vertices in S havepairwise different labels. Consider the induced subgraph G ′ of G obtained by completely deletingthe path P i and, for each { u, v } ∈ H i [ S ], the paths P i ( u, v ) and P i ( v, u ). Since only complete pathsare deleted and since each path in G is attached to r , the graph G ′ is connected. It remains toshow that the multiset of deleted colors is D . First, ι + and ι − are deleted once and containedonce in D . Second, each instance selection color ι [ p, q, τ ] is deleted once as required by D : If ι [ p, q, τ ] is contained in P i , then it is not contained in any P i ( u, v ). Conversely, if ι [ p, q, τ ] is notcontained in P i then it is contained in each P i ( u, v ) where u has color p and v has color q . Third,exactly n vertices of each vertex consistency color ω [ p, q ] are deleted: these vertices are containedonly in two paths P i ( u, v ), namely if u has label p and v has either label q or label q + 1. Sinceall the deleted paths with label pair ( p, · ) correspond to the same vertex u , the number of verticeswith color ω [ p, q ] is v ) if v has label q and n − v ) if v has label p . Hence, exactly n verticeswith this color are deleted, as required by D . Finally, we show that exactly n vertices of eachedge selection color ε [ p, q ], p < q , are deleted: Let u and v be the vertices of S with label p and q ,respectively. Then, the deleted path P i ( u, v ) contains u ) vertices with color ε [ p, q ] and the deletedpath P i ( v, u ) contains n − u ) vertices with this color. Altogether, the multiset of colors in G ′ isexactly C \ D = M .( ⇐ ) Let G ′ be a connected subgraph of G whose multiset of vertex colors is exactly M . Let V D := V ( G ) \ V ( G ′ ) denote the set of deleted vertices, that is, vertices not in G ′ . The color multiset ofthe vertex colors of V D is exactly D . Thus, exactly one vertex with color ι + and color ι − is deleted.Consequently exactly one path P i is completely deleted from G ′ : deleting ι + in some P i impliesthat the ι − in P i is also deleted. Thus, no further vertices from any P j , j = i , may be deleted.Moreover, since each label selection color λ [ p, q ] + or λ [ p, q ] − is contained exactly once in D , theset V D also contains exactly one path P j ( u, v ) where u has label p and v has label q . Moreover, wehave j = i by the assignment of the instance selection colors: If j = i , then there is some τ ∈ [log t ]such that b τ ( i ) = b τ ( j ). Then, however ι [ p, q, τ ] is either not contained in the colors of V D or it iscontained twice in the colors of V D . In either case, the set of deleted colors is different from D .Thus, all the deleted paths in the edge selection gadgets correspond to the same instance i . Nowconsider the paths for label pairs ( p, · ). These label pairs correspond to the same vertex: Otherwise,there is some P i ( u, v ) and some P i ( u ′ , v ′ ) such that u = u ′ , v has label q , and v ′ has label q + 1.Then, however, the number of vertices with color ω [ p, q ] does not equal n since P i ( u, v ) contains u )vertices of this color, P i ( u ′ , v ′ ) contains n − u ′ ) vertices of this color and u ) = u ′ ). Hence,the deleted paths correspond to a vertex set S with k different labels in some H i . It remains toshow that the graph H i [ S ] is a clique. 14 u v r r Figure 1: The two phases of the kernelization. Left: the input instance, where r , u , and v haveunique colors; the pendant non-unique subtrees are highlighted by the grey background. Middle:after Phase I, all vertices on paths between unique vertices are contracted into r . Right: in Phase II,all vertices with a color that was removed in Phase I are removed together with their descendants.Consider an arbitrary pair of labels p and q where p < q . Moreover, let u ∈ S and v ∈ S have label p and q , respectively. Let P i ( u, v ′ ) be the path for u that is deleted with this colorpair and let P i ( v, u ′ ) be the path for v that is deleted for this color pair. Then, P i ( u, v ′ ) containsexactly u ) vertices with color ε [ p, q ], P i ( v, u ′ ) contains exactly n − u ′ ) vertices of this color.Since D contains exactly n vertices of this color, this implies u = u ′ . By construction, this impliesthat u and v are neighbors in H i . For the combination of vertex-colored trees as input graphs and motifs that are sets, the problembecomes considerably easier. First, we show that in this case CGM admits a linear-vertex problemkernel that can be computed in linear time. The idea for the problem kernelization is based on twosimple observations. First, we observe that the number of vertices that are not unique is boundedin CGM . Lemma 2. Let ( M, G, χ ) be an instance of Colorful Graph Motif . Then at most ℓ verticesin G are not unique.Proof. Let C + denote the set of colors that occur more than once in G and let occ( c ) denote thenumber of occurrences of a color c in G . We denote c + := | C + | , n + := P c ∈ C + occ( c ), and n − thenumber of unique vertices in G . By definition, no color is repeated in M , thus | M | = c + + n − ;moreover, | V | = n + + n − . Hence, the number ℓ = | V |−| M | of vertices to delete satisfies ℓ = n + − c + .By definition n + ≥ c + , and thus we conclude that ℓ ≥ n + / r .Afterwards, in a second phase some further vertices are removed because their colors have been usedduring the contraction. Eventually, this results in an instance which has at most one unique vertexand thus, by Lemma 2, bounded size. For an example of the kernelization, see Figure 1. Below, wegive a more detailed description. Theorem 7. Colorful Graph Motif on trees admits a problem kernel with at most ℓ + 1 vertices that can be computed in O ( n ) time. roof. We first describe the kernelization algorithm, then we show its correctness and finally boundits running time. By Lemma 2, the size bound holds if the instance has no unique vertex. Thus,we assume that there is a unique vertex in the following.Given an instance ( M, G, χ ) of CGM , first root the input tree G at an arbitrary unique vertex r .Now call a subtree with root v pendant if it contains all descendants of v in G . Then, computein a bottom-up fashion maximal pendant subtrees such that no vertex in this subtree is unique.Call these subtrees the pendant non-unique subtrees . By Lemma 2, the total number of vertices inpendant non-unique subtrees is at most 2 ℓ . Now the algorithm removes vertices in two phases. Phase I. Remove from G all vertices except r that are not contained in a pendant non-uniquesubtree. Remove all colors of removed vertices from M . If there is a color c such that two verticeswith color c are removed in this step, then return “no”. Make r adjacent to the root of each pendantnon-unique subtree. Phase II. In the first step of this phase, for each color c where at least one vertex has beenremoved in Phase I, remove all vertices from G that have color c . In the second step of this phase,remove all descendants of these vertices. Finally, let M ′ denote the set of colors that are containedin the remaining instance. This completes the kernelization algorithm; the resulting instance hasat most 2 ℓ + 1 vertices since all vertices except r are unique. To show correctness, we first observethe following. Claim: every occurrence of M in G contains no vertex v that is removed during Phase II of thekernelization. This can be seen as follows. First, every occurrence of M in G contains all verticesremoved during Phase I: these vertices are either unique or lie on the uniquely determined pathbetween two unique vertices. Now consider a vertex v removed during Phase II. If v is removedin the first step of Phase II, then v has the same color c as a vertex u removed during Phase I.Consequently, v is not contained in an occurrence of M : By the above, the occurrence contains u and it contains no other vertex with color c . Otherwise, v is removed in the second step of Phase II,because v is not connected to r . Since every occurrence of M contains r , it thus cannot contain v .We now show the correctness of the kernelization, that is, the equivalence of the original in-stance ( M, G, χ ) and the resulting instance ( M ′ , G ′ , χ ′ ). First, assume that ( M, G, χ ) is a yes-instance. Let S T be an occurrence of M in G , and let T denote G [ S T ]; by the above claim, T contains only vertices that are removed during Phase I or that are contained in G ′ . Considerthe subtree T ′ of G that contains all vertices of T that are not removed during the kernelization.We show that T ′ is connected in G ′ and contains all colors of M ′ . Connectivity can be seen asfollows. First, observe that T and T ′ contain r . Second, any vertex v = r of T ′ is contained insome pendant non-unique subtree of G . Thus, v is in T connected to r via a path that first visitsonly vertices of T ′ , including the root of the pendant non-unique subtree. The root of the pendantnon-unique subtree is in G ′ adjacent to r . Thus, each vertex v = r has in T ′ a path to r whichimplies that T ′ is connected. It remains to prove that T ′ contains all colors of M ′ . Consider acolor c ∈ M ′ . Since c ∈ M ′ , none of the vertices with color c are removed in Phase I of the kernel-ization. Moreover, since no vertex of T is removed in Phase II of the kernelization, we have thatthe vertex of T with color c is contained in T ′ . Thus, T ′ contains each color of M ′ . Finally, T ′ contains each color at most once since T does.Now assume that ( M ′ , G ′ , χ ′ ) is a yes-instance and let S T ′ be an occurrence of M ′ in G ′ . Let T denote G [ S T ′ ∪ V I ], where V I is the set of vertices removed during Phase I of the kernelization. Weshow that T is connected and contains every color of G exactly once. To see that T is connectedobserve the following: Clearly, G [ { r } ∪ V I ] is connected. Moreover, each vertex v = r of T ′ hasin T ′ a path to r . This path contains a subpath from v to the root r ′ of the pendant non-unique16ubtree containing v . In G , r ′ is adjacent to some vertex of { r } ∪ V I . Therefore, r ′ is connected to r in T and thus T is connected. It remains to show that T contains every color of G exactly once.Clearly, T ′ contains at least one vertex of each color c ∈ M ′ . Moreover, it also contains at least onevertex of each color c ∈ M \ M ′ since it contains all vertices of V I . Besides, it contains each coloronly once: The vertices of T ′ have pairwise different colors and different colors than those of thevertices of V I . Finally, the vertices of V I have different pairwise colors since the kernelization didnot return “no”.The running time can be seen as follows. Determining the pendant non-unique subtrees canbe done by a standard bottom-up procedure in linear time. Removing all vertices during Phase Ican also be achieved in linear time. After removing a vertex with color c in Phase I, we label c as occupied . When we remove a vertex with an occupied color during Phase I, we immediately return“no”. After the removal of vertices during Phase I, we can construct M ′ from M in linear timeby removing each occupied color. Finally, we can in linear time add an edge between r and everyroot of a pendant non-unique subtree and then remove all remaining vertices that have an occupiedcolor. The final graph G ′ is obtained by performing a depth-first search from r , in order to includeonly those vertices still reachable from r .Now, let us turn to developing fast(er) FPT algorithms for CGM . It can be seen that it is possibleto solve CGM in trees in time 1 . ℓ · n O (1) , by ’branching on colors with the most occurrences’ untilevery color appears at most twice. More precisely, for a color c that appears at least three timesand some vertex v with color c , we can branch into the two cases to either delete v or to deletethe at least two other vertices that have color c . The branching vector for this branching ruleis (1 , 2) or better. Now, if every color appears at most twice, then CGM on trees can be solved inpolynomial time [12, Lemma 2]. By a different branching approach, the above running time can befurther improved. Branching Rule 2. If there is a color c such that there are two vertices u and v with color c thatare both not leaves of the tree G , then branch into the case to delete from G either • the maximal subtree containing u and all vertices w such that the path from v to w contains u ,or • the maximal subtree containing v and all vertices w such that the path from u to w contains v .Proof of correctness. No occurrence may contain vertices of both subtrees, since in this case itcontains u and v which have the same color.If the rule does not apply, then one can solve the problem in linear time; here, let occ( c ) denotethe number of occurrences of a color c in G . Lemma 3. Let ( M, G, χ ) be an instance of Colorful Graph Motif such that G is a tree andfor each color c with occ( c ) > at least occ( c ) − occurrences of c are leaves of G , then ( M, G, χ ) can be solved in O ( n ) time.Proof. For each color c with occ( c ) > 1, the algorithm simply deletes occ( c ) − c .This can be done in linear time by visiting all leaves via depth-first search, checking for each leafin O (1) time whether occ( c ) > O (1) time if this is the case. The resultinggraph contains each color exactly once, and it is connected since a tree cannot be made disconnectedby deleting leaves. For an introduction to the analysis of branching vectors, refer to [8, 16]. Theorem 8. Colorful Graph Motif can be solved in O ( √ ℓ + n ) time if G is a tree.Proof. The algorithm is as follows. First, reduce the input instance in O ( n ) time to an equivalentone with O ( ℓ ) vertices using the kernelization of Theorem 7. Now, apply Branching Rule 2. If thisrule is no longer applicable, then solve the instance in O ( ℓ ) time (by applying the algorithm behindLemma 3). Since the graph has O ( ℓ ) vertices, applicability of Branching Rule 2 can be testedin O ( ℓ ) time. Thus, the overall running time is O ( ℓ ) times the number of search tree nodes. Sinceeach application of Branching Rule 2 creates two branches and reduces ℓ by at least two in eachbranch, the search tree has size O (2 ℓ/ ) = O ( √ ℓ ). The resulting running time is O ( √ ℓ · ℓ + n ).Furthermore, the factor of ℓ in the running time can be removed by interleaving search tree andkernelization [21], that is, by applying the kernelization algorithm of Theorem 7 in each search treenode. In this paper, we have studied the Graph Motif , List-Colored Graph Motif and ColorfulGraph Motif problems, and in particular their behavior in terms of parameterized complexity,when the parameter is ℓ = | V | − | M | , i.e. the number of vertices of G that are not kept in a solution.We left open the parameterized complexity for parameter ℓ for List-Colored Graph Motif on trees, even when the vertex-color graph is a forest.As mentioned in the introduction, parameterization by ℓ may be interesting not only from atheoretic, but also from an applied point of view. Unfortunately, for the practically relevant caseof List-Colored Graph Motif we have obtained W[1]-hardness even for very restricted colorlists L . Moreover, as noted by Fertin et al. [14], a reduction of Rauf et al. [24] shows that thevariant of Colorful Graph Motif where G is directed and has edge weights is W[1]-hard withrespect to ℓ . However, the combination of ℓ with further structure related to the colors of C ledto tractability results [14, 15]. It would be interesting to identify such color-related structure alsofor List-Colored Graph Motif . References [1] Abhimanyu M. Ambalath, Radheshyam Balasundaram, Chintan Rao H., Venkata Koppula,Neeldhara Misra, Geevarghese Philip, and M. S. Ramanujan. On the kernelization complexityof colorful motifs. In Proceedings of the 5th International Symposium on Parameterized andExact Computation (IPEC ’10) , volume 6478 of Lecture Notes in Computer Science , pages14–25. Springer, 2010.[2] Nadja Betzler, René van Bevern, Christian Komusiewicz, Michael R. Fellows, and Rolf Nie-dermeier. Parameterized algorithmics for finding connected motifs in biological networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 8(5):1296–1308, 2011.[3] Andreas Björklund, Petteri Kaski, and Lukasz Kowalik. Constrained multilinear detection andgeneralized graph motifs. Algorithmica , 74(2):947–967, 2016.184] Hans L. Bodlaender, Bart M. P. Jansen, and Stefan Kratsch. Kernelization lower bounds bycross-composition. SIAM Journal on Discrete Mathematics , 28(1):277–305, 2014.[5] Hans L. Bodlaender, Stéphan Thomassé, and Anders Yeo. Kernel bounds for disjoint cyclesand disjoint paths. Theoretical Computer Science , 412(35):4570–4578, 2011.[6] Édouard Bonnet and Florian Sikora. The Graph Motif problem parameterized by the structureof the input graph. Discrete Applied Mathematics , 231:78–94, 2017.[7] Sharon Bruckner, Falk Hüffner, Richard M. Karp, Ron Shamir, and Roded Sharan. Topology-free querying of protein interaction networks. Journal of Computational Biology , 17(3):237–252,2010.[8] Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, MarcinPilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms . Springer, 2015.[9] Bireswar Das, Murali Krishna Enduri, Neeldhara Misra, and I. Vinod Reddy. On structuralparameterizations of Graph Motif and Chromatic Number. In Proceedings of the Third Inter-national Conference on Algorithms and Discrete Applied Mathematics (CALDAM ’17) , volume10156 of Lecture Notes in Computer Science , pages 118–129. Springer, 2017.[10] Holger Dell and Dieter van Melkebeek. Satisfiability allows no nontrivial sparsification unlessthe polynomial-time hierarchy collapses. In Proceedings of the 42nd ACM Symposium onTheory of Computing (STOC ’10) , pages 251–260. ACM, 2010.[11] Rod G. Downey and Michael R. Fellows. Fundamentals of Parameterized Complexity . Springer,2013.[12] Michael R. Fellows, Guillaume Fertin, Danny Hermelin, and Stéphane Vialette. Upper andlower bounds for finding connected motifs in vertex-colored graphs. Journal of Computer andSystem Sciences , 77(4):799–811, 2011.[13] Michael R. Fellows, Danny Hermelin, Frances Rosamond, and Stéphane Vialette. On theparameterized complexity of multiple-interval graph problems. Theoretical Computer Science ,410(1):53–61, 2009.[14] Guillaume Fertin, Julien Fradin, and Géraldine Jean. Algorithmic aspects of the MaximumColorful Arborescence problem. In Proceedings of the 14th Annual Conference on Theoryand Applications of Models of Computation (TAMC ’17) , volume 10185 of Lecture Notes inComputer Science , pages 216–230, 2017.[15] Guillaume Fertin, Julien Fradin, and Christian Komusiewicz. On the Maximum Colorful Ar-borescence problem and color hierarchy graph structure. In Gonzalo Navarro, David Sankoff,and Binhai Zhu, editors, Proceedings of the 29th Annual Symposium on Combinatorial PatternMatching (CPM ’18) , volume 105 of LIPIcs , pages 17:1–17:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.[16] Fedor V. Fomin and Dieter Kratsch. Exact Exponential Algorithms . Springer-Verlag, 1stedition, 2010. 1917] Robert Ganian. Twin-cover: Beyond vertex cover in parameterized algorithmics. In Proceedingsof the 6th International Symposium on Parameterized and Exact Computation (IPEC ’11) ,volume 7112 of Lecture Notes in Computer Science , pages 259–271. Springer, 2011.[18] Sepp Hartung, Christian Komusiewicz, and André Nichterlein. Parameterized algorithmics andcomputational experiments for finding 2-clubs. Journal of Graph Algorithms and Applications ,19(1):155–190, 2015.[19] Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which problems have stronglyexponential complexity? Journal of Computer and System Sciences , 63(4):512–530, 2001.[20] Vincent Lacroix, Cristina G. Fernandes, and Marie-France Sagot. Motif search in graphs:Application to metabolic networks. IEEE/ACM Transactions on Computational Biology andBioinformatics , 3(4):360–368, 2006.[21] Rolf Niedermeier and Peter Rossmanith. A general method to speed up fixed-parameter-tractable algorithms. Information Processing Letters , 73(3-4):125–129, 2000.[22] Ron Y. Pinter, Hadas Shachnai, and Meirav Zehavi. Deterministic parameterized algorithmsfor the Graph Motif problem. Discrete Applied Mathematics , 213:162–178, 2016.[23] Ron Y. Pinter and Meirav Zehavi. Algorithms for topology-free and alignment network queries. Journal of Discrete Algorithms , 27:29–53, 2014.[24] Imran Rauf, Florian Rasche, François Nicolas, and Sebastian Böcker. Finding maximum col-orful subtrees in practice. Journal of Computational Biology , 20(4):311–321, 2013.[25] Roded Sharan and Trey Ideker. Modeling cellular machinery through biological network com-parison.