[PDF] Improved Quantum Query Algorithms for Triangle Finding and Associativity Testing

Abstract

We show that the quantum query complexity of detecting if an n -vertex graph contains a triangle is O( n 9/7 ) . This improves the previous best algorithm of Belovs making O( n 35/27 ) queries. For the problem of determining if an operation ∘:S×S→S is associative, we give an algorithm making O(|S | 10/7 ) queries, the first improvement to the trivial O(|S | 3/2 ) application of Grover search. Our algorithms are designed using the learning graph framework of Belovs. We give a family of algorithms for detecting constant-sized subgraphs, which can possibly be directed and colored. These algorithms are designed in a simple high-level language; our main theorem shows how this high-level language can be compiled as a learning graph and gives the resulting complexity. The key idea to our improvements is to allow more freedom in the parameters of the database kept by the algorithm. As in our previous work, the edge slots maintained in the database are specified by a graph whose edges are the union of regular bipartite graphs, the overall structure of which mimics that of the graph of the certificate. By allowing these bipartite graphs to be unbalanced and of variable degree we obtain better algorithms.

Full PDF

IImproved Quantum Query Algorithms forTriangle Finding and Associativity Testing ∗ Troy Lee † Fr´ed´eric Magniez ‡ Miklos Santha § Abstract

We show that the quantum query complexity of detecting if an n -vertex graph contains a triangleis O ( n / ). This improves the previous best algorithm of Belovs [2] making O ( n / ) queries. For theproblem of determining if an operation ◦ : S × S → S is associative, we give an algorithm making O ( | S | / ) queries, the ﬁrst improvement to the trivial O ( | S | / ) application of Grover search.Our algorithms are designed using the learning graph framework of Belovs. We give a family ofalgorithms for detecting constant-sized subgraphs, which can possibly be directed and colored. Thesealgorithms are designed in a simple high-level language; our main theorem shows how this high-levellanguage can be compiled as a learning graph and gives the resulting complexity.The key idea to our improvements is to allow more freedom in the parameters of the database keptby the algorithm. As in our previous work [9], the edge slots maintained in the database are speciﬁedby a graph whose edges are the union of regular bipartite graphs, the overall structure of which mimicsthat of the graph of the certiﬁcate. By allowing these bipartite graphs to be unbalanced and of variabledegree we obtain better algorithms. Quantum query complexity is a black-box model of quantum computation, where the resource measured isthe number of queries to the input needed to compute a function. This model captures the great algorithmicsuccesses of quantum computing like the search algorithm of Grover [5] and the period ﬁnding subroutine ofShor’s factoring algorithm [12], while at the same time is simple enough that one can often show tight lowerbounds.Recently, there have been very exciting developments in quantum query complexity. Reichardt [11]showed that the general adversary bound, formerly just a lower bound technique for quantum query com-plexity [7], is also an upper bound. This characterization opens a new avenue for designing quantum queryalgorithms. The general adversary bound can be written as a relatively simple semideﬁnite program, thusby providing a feasible solution to the minimization form of this program one can upper bound quantumquery complexity.This plan turns out to be quite diﬃcult to implement as the minimization form of the adversary boundhas exponentially many constraints. Even for simple functions it can be challenging to directly write downa feasible solution, much less worry about ﬁnding a solution with good objective value.To surmount this problem, Belovs [2] introduced the beautiful model of learning graphs, which can beviewed as the minimization form of the general adversary bound with additional structure imposed on the ∗ Partially supported by the French ANR Deﬁs project ANR-08-EMER-012 (QRAC) and the European Commission ISTSTREP project 25596 (QCS). Research at the Centre for Quantum Technologies is funded by the Singapore Ministry ofEducation and the National Research Foundation. † Centre for Quantum Technologies, National University of Singapore, Singapore 117543. [email protected] ‡ CNRS, LIAFA, Univ Paris Diderot, Sorbonne Paris-Cit´e, 75205 Paris, France. [email protected] § CNRS, LIAFA, Univ Paris Diderot, Sorbonne Paris-Cit´e, 75205 Paris, France; and Centre for Quantum Technologies,National University of Singapore, Singapore 117543. [email protected] a r X i v : . [ qu a n t - ph ] O c t orm of the solution. This additional structure makes learning graphs easier to reason about by ensuringthat the constraints are automatically satisﬁed, leaving one to worry about optimizing the objective value.Learning graphs have already proven their worth, with Belovs using this model to give an algorithmfor triangle ﬁnding with complexity O ( n / ), improving the quantum walk algorithm [10] of complexity O ( n . ). Belovs’ algorithm was generalized to detecting constant-sized subgraphs [13, 9], giving an algorithmof complexity o ( n − /k ) for determining if a graph contains a k -vertex subgraph H , again improving the [10]bound of O ( n − /k ). All these algorithms use the most basic model of learning graphs, that we also use inthis paper. A more general model of learning graphs (introduced, though not used in Belovs’ original paper)was used to give an o ( n / ) algorithm for k -element distinctness, when the inputs are promised to be of acertain form [3]. Recently, Belovs further generalized the learning graph model and removed this promise toobtain an o ( n / ) algorithm for the general k -distinctness problem [1].In this paper, we continue to show the power of the learning graph model. We give an algorithm fordetecting a triangle in a graph making O ( n / ) queries. This lowers the exponent of Belovs algorithm fromabout 1 .

296 to under 1 . ◦ : S × S → S is associative, where | S | = n , we give an algorithm making O ( n / ) queries, the ﬁrst improvement over the trivial applicationof Grover search making O ( n / ) queries. Previously, D¨orn and Thierauf [4] gave a quantum walk basedalgorithm to test if ◦ : S × S → S (cid:48) is associative that improved on Grover search but only when | S (cid:48) | < n / .More generally, we give a family of algorithms for detecting constant-sized subgraphs, which can possiblybe directed and colored. Algorithms in this family can be designed using a simple high-level language. Ourmain theorem shows how to compile this language as a learning graph, and gives the resulting complexity.We now explain in more detail how our algorithms improve over previous work. Our contribution.

We will explain the new ideas in our algorithm using triangle detection as anexample. We ﬁrst review the quantum walk algorithm of [10], and the learning graph algorithm of Belovs [2].For this high-level overview we just focus on the database of edge slots of the input graph G that is maintainedby the algorithm. A quantum walk algorithm explicitly maintains such a database, and the nodes of a learninggraph are labeled by sets of queries which we will similarly interpret as the database of the algorithm.In the quantum walk algorithm [10] the database consists of an r -element subset of the n -vertices of G and all the edge slots among these r -vertices. That is, the presence or absence of an edge in G among acomplete r -element subgraph is maintained by the database. In the learning graph algorithm of Belovs, thedatabase consists of a random subgraph with edge density 0 ≤ s ≤ r -element subgraph. Inthis way, on average, O ( sr ) many edge slots are queried among the r -element subset, making it cheaper toset up this database. This saving is what results in the improvement of Belovs’ algorithm. Both algorithmsﬁnish by using search plus graph collision to locate a vertex that is connected to the endpoints of an edgepresent in the database, forming a triangle.Zhu [13] and Lee et al. [9] extended the triangle ﬁnding algorithm of Belovs to ﬁnding constant sizedsubgraphs. While the algorithm of Zhu again maintains a database of a random subgraph of an r -vertexcomplete graph with edge density s , the algorithm of Lee et al. instead used a more structured database.Let H be a k -vertex subgraph with vertices labeled from [ k ]. To determine if G contains a copy of H , thedatabase of the algorithm consists of k − A , . . . , A k − of size r and for every { i, j } ∈ H − { k } theedge slots of G according to a sr -regular bipartite graph between A i and A j . Again both algorithms ﬁnishby using search plus graph collision to ﬁnd a vertex connected to edges in the database to form a copy of H .In this work, our database is again the edge slots of G queried according according to the union of regularbipartite graphs whose overall structure mimics the structure of H . Now, however, we allow optimizationover all parameters of the database—we allow the size of the set A i to be a parameter r i that can beindependently chosen; similarly, we allow the degree of the bipartite graph between A i and A j to be avariable d ij . This greater freedom in the parameters of the database allows the improvement in triangleﬁnding from O ( n / ) to O ( n / ). Instead of an r -vertex graph with edge density s , our algorithm uses as adatabase a complete unbalanced bipartite graph with left hand side of size r and right hand side of size r .Taking r < r allows a more eﬃcient distribution of resources over the course of the algorithm. As before,the algorithm ﬁnishes by using search plus graph collision to ﬁnd a vertex connected to endpoints of an edgein the database. 2he extension to functions of the form f : [ q ] n × n → { , } , like associativity, comes from the fact that thebasic learning graph model that we use depends only on the structure of a 1-certiﬁcate and not on the valuesin a 1-certiﬁcate. This property means that an algorithm for detecting a subgraph H can be immediatelyapplied to detecting H with speciﬁed edge colors in a colored graph.If an operation ◦ : S × S → S is non-associative, then there are elements a, b, c such that a ◦ ( b ◦ c ) (cid:54) = ( a ◦ b ) ◦ c .A certiﬁcate consists of the 4 (colored and directed) edges b ◦ c = e, a ◦ e, a ◦ b = d , and d ◦ c such that a ◦ e (cid:54) = d ◦ c . The graph of this certiﬁcate is a 4-path with directed edges, and using our algorithm for thisgraph gives complexity O ( | S | / ).We provide a high-level language for designing algorithms within our framework. The algorithm beginsby choosing size parameters for each A i and degree parameters for the bipartite graph between A i and A j .Then one can choose the order in which to load vertices a i and edges ( a i , a j ) of a 1-certiﬁcate, according tothe rules that both endpoints of an edge must be loaded before the edge, and at the end all edges of thecertiﬁcate must be loaded. Our main theorem Theorem 8 shows how to implement this high-level algorithmas a learning graph and gives the resulting complexity.With larger subgraphs, optimizing over the set size and degree parameters to obtain an algorithm ofminimal complexity becomes unwieldy to do by hand. Fortunately, this can be phrased as a linear programand we provide code to compute a set of optimal parameters . The quantum query complexity of a function f , denoted Q ( f ), is the number of input queries needed toevaluate f with error at most 1 /

3. We refer the reader to the survey [6] for precise deﬁnitions and background.For any integer q ≥

1, let [ q ] = { , , . . . , q } . We will deal with boolean functions of the form f : [ q ] n × n →{ , } , where the input to the function can be thought of as the complete directed graph (possibly with self-loops) on vertex set [ n ], whose edges are colored by elements from [ q ]. When q = 2, the input is of course justa directed graph (again possibly with self-loops). A partial assignment is an element of the set ([ q ] ∪ { (cid:63) } ) n × n .For partial assignments α and α we say that α is a restriction of α (or alternately α is an extension of α ) if whenever α ( i, j ) (cid:54) = (cid:63) then α ( i, j ) = α ( i, j ). A 1 -certiﬁcate for f is a partial assignment α such that f ( x ) = 1 for every extension x ∈ [ q ] n × n of α . If α is a 1-certiﬁcate and x ∈ [ q ] n × n is an extension of α , wealso say that α is a 1-certiﬁcate for f and x . A 1-certiﬁcate α is minimal if no proper restriction of α is a1-certiﬁcate. The index set of a 1-certiﬁcate α for f is the set I α = { ( i, j ) ∈ [ n ] × [ n ] : α ( i, j ) (cid:54) = (cid:63) } . Besidesthese standard notions, we will also need the notion of the graph of a 1-certiﬁcate. For a graph G , let V ( G )denote the set of vertices, and E ( G ) the set of edges of G . Deﬁnition 1 (Certiﬁcate graph) . Let α be a -certiﬁcate for f : [ q ] n × n → { , } . The certiﬁcate graph H α of α is deﬁned by E ( H α ) = I α , and V ( H α ) is the set of elements in [ n ] which are adjacent to an edge in I α .The size of a certiﬁcate graph is the cardinality of its edges. A minimal certiﬁcate graph for x , such that f ( x ) = 1 , is the certiﬁcate graph of a minimal 1-certiﬁcate for f and x . The of f isthe size of the biggest minimal certiﬁcate graph for some x such that f ( x ) = 1 . Intuitively, if x ∈ [ q ] n × n is an extension of a 1-certiﬁcate α , the certiﬁcate graph of α represents queriesthat are suﬃcient to verify f ( x ) = 1.Vertices of our learning graphs will be labeled by sets of edges coming from the union of a bunch ofbipartite graphs. We will specify these bipartite graphs by their degree sequences, the number of vertices onthe left hand side and right hand side of a given degree. The following notation will be useful to do this. Deﬁnition 2 (Type of bipartite graph) . A bipartite graph between two sets Y and Y is of type( { ( n , d ) , . . . , ( n j , d j ) } , { ( m , g ) , . . . , ( m (cid:96) , g (cid:96) ) } ) if Y has n i vertices of degree d i for i = 1 , . . . , j , and Y has m i vertices of degree g i for i = 1 , . . . , (cid:96) , and this is a complete listing of vertices in the graph, i.e. | Y | = (cid:80) ji =1 n i and | Y | = (cid:80) (cid:96)i =1 m i . Note also that (cid:80) ji =1 n i d i = (cid:80) (cid:96)i =1 m i g i . code is available at https://github.com/troyjlee/learning_graph_lp earning graphs We now formally deﬁne a learning graph and its complexity. We ﬁrst deﬁne a learninggraph in the abstract.

Deﬁnition 3 (Learning graph) . A learning graph G is a 5-tuple ( V , E , w, (cid:96), { p y : y ∈ Y } ) where ( V , E ) isa rooted, weighted and directed acyclic graph, the weight function w : E → R maps learning graph edges topositive real numbers, the length function (cid:96) : E → N assigns each edge a natural number, and p y : E → R isa unit ﬂow whose source is the root, for every y ∈ Y . A learning graph for a function has additional requirements as follows.

Deﬁnition 4 (Learning graph for a function) . Let f : [ q ] n × n → { , } be a function. A learning graph G for f is a 5-tuple ( V , E , S, w, { p y : y ∈ f − (1) } ) , where S : V → n × n maps v ∈ V to a label S ( v ) ⊆ [ n ] × [ n ] ofvariable indices, and ( V , E , w, (cid:96), { p y : y ∈ f − (1) } ) is a learning graph for the length function (cid:96) deﬁned as (cid:96) (( u, v ) = | S ( v ) \ S ( u ) | for each edge ( u, v ) . For the root r ∈ V we have S ( r ) = ∅ , and every learning graphedge e = ( u, v ) satisﬁes S ( u ) ⊆ S ( v ) . For each input y ∈ f − (1) , the set S ( v ) contains the index set of a -certiﬁcate for y on f , for every sink v ∈ V of p y . In our construction of learning graphs we usually deﬁne S by more colloquially stating the label of eachvertex. Note that it can be the case for an edge ( u, v ) that S ( u ) = S ( v ) and the length of the edge is zero.In Belovs [2] what we deﬁne here is called a reduced learning graph, and a learning graph is restricted tohave all edges of length at most one.In this paper we will discuss functions whose inputs are themselves graphs. To prevent confusion we willrefer to vertices and edges of the learning graph as L -vertices and L -edges respectively.We now deﬁne the complexity of a learning graph. For the analysis it will be helpful to deﬁne thecomplexity not just for the entire learning graph but also for stages of the learning graph G . By level d of G we refer to the set of vertices at distance d from the root. A stage is the set of edges of G between level i andlevel j , for some i < j . For a subset V ⊆ V of the L -vertices let V + = { ( v, w ) ∈ E : v ∈ V } and similarly let V − = { ( u, v ) ∈ E : v ∈ V } . For a vertex v we will write v + instead of { v } + , and similarly for v − instead of { v } − . Deﬁnition 5 (Learning graph complexity) . Let G be a learning graph, and let E ⊆ E be the edges of a stage.The negative complexity of E is C ( E ) = (cid:88) e ∈ E (cid:96) ( e ) w ( e ) . The positive complexity of E under the ﬂow p y is C ,y ( E ) = (cid:88) e ∈ E (cid:96) ( e ) w ( e ) p y ( e ) . The positive complexity of E is C ( E ) = max y ∈ Y C ,y ( E ) . The complexity of E is C ( E ) = (cid:112) C ( E ) C ( E ) , and the learning graph complexity of G is C ( G ) = C ( E ) .The learning graph complexity of a function f , denoted LG ( f ) , is the minimum learning graph complexity ofa learning graph for f . Theorem 1 (Belovs) . Q ( f ) = O ( LG ( f )) . Originally Belovs showed this theorem with an additional log q factor for functions over an input alphabetof size q ; this logarithmic factor was removed in [3]. 4 nalysis of learning graphs Given a learning graph G , the easiest way to obtain another learning graphis to modify the weight function of G . We will often use this reweighting scheme to obtain learning graphswith better complexity or complexity that is more convenient to analyze. When G is understood from thecontext, and when w (cid:48) is the new weight function, for the edges E ⊆ E of a stage, we denote the complexityof E with respect to w (cid:48) by C w (cid:48) ( E ).The following useful lemma of Belovs gives an example of the reweighting method. It shows how to upperbound the complexity of a learning graph by partitioning it into a constant number of stages and summingthe complexities of the stages. Lemma 2 (Belovs) . If E can be partitioned into a constant number k of stages E , . . . , E k , then there existsa weight function w (cid:48) such that C w (cid:48) ( G ) = O ( C ( E ) + . . . + C ( E k )) . Now we will focus on evaluating the complexity of a stage. Our learning graph algorithm for triangledetection is of a very simple form, where all L -edges present in the graph have weight one, all L -vertices ina level have the same degree, incoming and outgoing ﬂows are uniform over a subset of L -vertices in eachlevel, and all L -edges between levels are of the same length. In this case the complexity of a stage betweenconsecutive levels can be estimated quite simply. Lemma 3.

Consider a stage of a learning graph between consecutive levels. Let V be the set of L -verticesat the beginning of the stage. Suppose that each L -vertex v ∈ V is of degree- d with all outgoing L -edges e ofweight w ( e ) = 1 and of length (cid:96) ( e ) ≤ (cid:96) . Furthermore, say that the incoming ﬂow is uniform over L -vertices W ⊆ V , and is uniformly directed from each L -vertex v ∈ W to g of the d possible neighbors. Then thecomplexity of this stage is at most (cid:96) (cid:113) d | V | g | W | .Proof. The total weight is d | V | . The ﬂow through each of the g | W | many L -edges is ( g | W | ) − . Pluggingthese into Deﬁnition 5 gives the lemma.To analyze the cost of our algorithm for triangle detection, we will repeatedly use Lemma 3. Thecontributions to the complexity of a stage are naturally broken into three parts: the length (cid:96) , the vertexratio | V | / | W | , and the degree ratio d/g . This terminology will be helpful in discussing the complexity ofstages.For our more general framework given in Section 4, ﬂows will no longer be uniform. To evaluate thecomplexity in this case, we will use several lemmas developed in [9]. The main idea is to use the symmetryof the function to decompose ﬂows as a convex combination of uniform ﬂows over disjoint edge sets. Anatural extension of Lemma 3 can then be used to evaluate the complexity. To state the lemma we ﬁrstneed a deﬁnition. For a set of L -edges E , we let p y ( E ) denote the value of the ﬂow p y over E , that is p y ( E ) = (cid:80) e ∈ E p y ( e ). Deﬁnition 6 (Consistent ﬂows) . Let E be a stage of G between two consecutive levels, and let V , . . . , V s bea partition of the L -vertices at the beginning of the stage. We say that { p y } is consistent with V +1 , . . . , V + s if p y ( V + i ) is independent of y for each i . The next lemma is the main tool for evaluating the complexity of learning graphs in our main theorem,Theorem 8.

Lemma 4 ([9]) . Let E be a stage of G between two consecutive levels. Let V be the set of L -vertices at thebeginning of the stage and suppose that each v ∈ V has outdegree d and all L -edges e of the stage satisfy w ( e ) = 1 and (cid:96) ( e ) ≤ (cid:96) . Let V , . . . , V s be a partition of V , and for all y and i , let W y,i ⊆ V i be the set ofvertices in V i which receive positive ﬂow under p y . Suppose that1. the ﬂows { p y } are consistent with { V i + } ,2. | W y,i | is independent from y for every i , and for all v ∈ W y,i we have p y ( v + ) = p y ( V i + ) / | W y,i | , . there is a g such that for each vertex v ∈ W y,i the ﬂow is directed uniformly to g of the d manyneighbors.Then there is a new weight function w (cid:48) such that C w (cid:48) ( E ) ≤ max i (cid:96) (cid:115) dg | V i || W y,i | . (1)We will refer to max i | V i | / | W y,i | as the maximum vertex ratio . For the most part we will deal with theproblem of detecting a (possibly directed and colored) subgraph in an n - vertex graph. We will be interestedin symmetries induced by permuting the elements of [ n ], as such permutations do not change the propertyof containing a ﬁxed subgraph. We now state two additional lemmas from [9] that use this symmetry to helpestablish the hypotheses of Lemma 4.For σ ∈ S n , we deﬁne and also denote by σ the permutation over [ n ] × [ n ] such that σ ( i, j ) = ( σ ( i ) , σ ( j )).Recall that each L -vertex u is labeled by a k -partite graph on [ n ], say with color classes A , . . . , A k , andthat we identify an L -vertex with its label. For σ ∈ S n we deﬁne the action of σ on u as σ ( u ) = v , where v is a k -partite graph with color classes σ ( A ) , . . . , σ ( A k ) and edges { σ ( i ) , σ ( j ) } for every edge { i, j } in u .Deﬁne an equivalence class [ u ] of L -vertices by [ u ] = { σ ( u ) : σ ∈ S n } . We say that S n acts transitively on ﬂows { p y } if for every y, y (cid:48) there is a τ ∈ S n such that p y (( u, v )) = p y (cid:48) (( τ ( u ) , τ ( v )) for all L -edges ( u, v ).The following lemma from [9] shows that if S n acts transitively on a set of ﬂows { p y } then they areconsistent with [ v ] + , where v is a vertex at the beginning of a stage between consecutive levels. This will setus up to satisfy hypothesis (1) of Lemma 4. Lemma 5 ([9]) . Consider a learning graph G and a set of ﬂows { p y } such that S n acts transitively on { p y } .Let V be the set of L -vertices of G at some given level. Then { p y } is consistent with { [ u ] + : u ∈ V } , and,similarly, { p y } is consistent with { [ u ] − : u ∈ V } . The next lemma gives a suﬃcient condition for hypothesis (2) of Lemma 4 to be satisﬁed. The partitionof vertices in Lemma 4 will be taken according to the equivalence classes [ u ]. Lemma 6 ([9]) . Consider a learning graph and a set of ﬂows { p y } such that S n acts transitively on { p y } .Suppose that for every L -vertex u and ﬂow p y such that p y ( u − ) > ,1. the ﬂow from u is uniformly directed to g + ([ u ]) many neighbors,2. for every L -vertex w , the number of incoming edges with from [ w ] to u is g − ([ w ] , [ u ]) .Then for every L -vertex u the ﬂow entering [ u ] is uniformly distributed over W y, [ u ] ⊆ [ u ] where | W y, [ u ] | isindependent of y . Theorem 7.

There is a bounded-error quantum query algorithm for detecting if an n -vertex graph containsa triangle making O ( n / ) many queries.Proof. We will show the theorem by giving a learning graph of the claimed complexity, which is suﬃcient byTheorem 1. We will deﬁne the learning graph by stages; let V t denote the L -vertices of the learning graphpresent at the beginning of stage t . The L -edges between V t and V t +1 are deﬁned in the obvious way—thereis an L -edge between v t ∈ V t and v t +1 ∈ V t +1 if the graph labeling v t is a subgraph of the graph labeling v t +1 , and all such L -edges have weight one. The root of the learning graph is labeled by the empty graph.For a positive input graph G , let a , a , a be the vertices of a triangle of G . The algorithm (see Figure 1)depends on set size parameters r , r ∈ [ n ] , r , r = o ( n ) and a vertex degree parameter λ ∈ [ n ] that will beoptimized later. We will choose r < r such that r /r is an integer. The cost of each stage will be upperbounded using Lemma 3. 6 egree r degree r A : r vertices A : r vertices degree r degree r new vertexwith degree r with degree r A with new vertex ! r vertices A : r vertices r verticesdegree r r vertices new vertex with degree r degree r r vertices A : r vertices A with new vertex ! r vertices with degree r r vertices A : 1 new vertex vertices degree A : r vertices A : r vertices all connected to A all connected to A A : 1 vertex new edge A : r vertices vertices all connected to A A : r vertices all connected to A vertices A : 1 vertex new edge A : r vertices all connected to A A : r vertices all connected to A Figure 1: Stages 1-6 for Triangle Algorithm7 tage 1 (Setup):

The initial level V consists of the root of the learning graph labeled by the emptygraph. The level V consists of all L -vertices labeled by a complete unbalanced bipartite graph with disjointcolor classes A , A ⊆ [ n ] where | A | = r − | A | = r − r ≤ r . Flow is uniform from the rootto all L -vertices such that a i (cid:54)∈ A , a i (cid:54)∈ A for i = 1 , , Cost:

The hypotheses of Lemma 3 hold trivially at this stage. The length of this stage is O ( r r ). Thevertex ratio is 1, and the degree ratio is (cid:0) nr (cid:1)(cid:0) n − r r (cid:1)(cid:0) n − r (cid:1)(cid:0) n − r − r (cid:1) = O (1) , as r , r = o ( n ). Thus the overall cost is O ( r r ). Stage 2 (Load a ): During this stage we add a vertex to the set A and connect it to all vertices in A .Formally, V consists of all vertices labeled by a complete bipartite graph between color classes A , A ofsizes r , r −

1, respectively. The ﬂow goes uniformly to those L -vertices where a is the vertex added to A . Cost:

By the deﬁnition of stage 1, the ﬂow is uniform over L -vertices at the beginning of stage 2. Theout-degree of every L -vertex in V is n − r − r + 2. Of these, in L -vertices with ﬂow, exactly one edge istaken by the ﬂow. Thus we can apply Lemma 3. Since the degree ratio was O (1) for the ﬁrst stage, thevertex ratio is also O (1) for this stage. The length is r −

1. The degree ratio is O ( n ). Thus the cost of thisstage is O ( √ nr ). Stage 3 (Load a ): We add a vertex to A and connect it to all of the r many vertices in A . Thus the L -vertices at the end of stage 3 consist of all complete bipartite graphs between sets A , A of sizes r , r ,respectively. The ﬂow goes uniformly to those L -vertices where a is added at this stage to A . Note thatsince we work with a complete bipartite graph, if a ∈ A and a ∈ A then the edge { a , a } is automaticallypresent. Cost:

The amount of ﬂow in a vertex with ﬂow at the beginning of stage 3 is the same as at the beginning ofstage 2, as the ﬂow out-degree in stage 2 was one and there was no merging of ﬂow. Thus ﬂow is still uniformat the beginning of stage 3. The out-degree of each L -vertex is n − r − r + 1 and again for L -vertices withﬂow, the ﬂow out-degree is exactly one. Thus we can again apply Lemma 3.The length of this stage is r . The vertex ratio is O ( n/r ) as ﬂow is present in L -vertices where a is inthe set A of size r (and such that a , a are not loaded which only aﬀects things by a O (1) factor). Thedegree ratio is again O ( n ) as the ﬂow only uses L -edges where a is added out of n − r − r + 1 possiblechoices. Thus the cost of this stage is O ( (cid:112) n/r √ nr ) = O ( n √ r ). Stage 4 (Load a ): We pick a vertex v and λ many edges connecting v to A . Thus the L -vertices at theend of stage 4 are labeled by edges that are the union of two bipartite graphs: a complete bipartite graphbetween A , A of sizes r , r , and a bipartite graph between v and A of type { (1 , λ ) } , { ( λ, , ( r − λ, } .Flow goes uniformly to those L -vertices where v = a and the edge { a , a } is not loaded. Cost:

Again the amount of ﬂow in a vertex with ﬂow at the beginning of stage 4 is the same as at thebeginning of stage 3, as the ﬂow out-degree in stage 3 was one and there was no merging of ﬂow. Thus theﬂow is still uniform. The out-degree of L -vertices is ( n − r − r ) (cid:0) r λ (cid:1) , and the ﬂow out-degree is (cid:0) r − λ (cid:1) . Thuswe can again apply Lemma 3.The length of this stage is λ . At the beginning of stage 4 ﬂow is present in those L -vertices where a ∈ A , a ∈ A and a is not loaded. Thus the vertex ratio is O (( n/r )( n/r )). Finally, the degree ratio is O ( n ). Thus the overall cost of this stage is O (cid:18)(cid:114) nr (cid:114) nr √ nλ (cid:19) = O (cid:18) n / λ √ r r (cid:19) . tage 5 (Load { a , a } ): We add one new edge between v and A . Thus the L -vertices at the end ofthis stage will be labeled by the union of edges in two bipartite graphs: a complete bipartite graph between A , A of sizes r , r , and the second between v and A of type { (1 , λ + 1) } , { ( λ + 1 , , ( r − λ − , } . Flowgoes uniformly along those L -edges where the edge added is { a , a } . Cost:

The ﬂow is uniform at the beginning of this stage, as it was uniform at the beginning of stage 4, theﬂow out-degree was constant in stage 4, and there was no merging of ﬂow. Each L -vertex has out-degree r − λ and the ﬂow-outdegree is one. Thus we can again apply Lemma 3.The length of this stage is one. The vertex ratio is O (( n/r )( n/r ) n ) as ﬂow is present in a constantfraction of those L -vertices where a ∈ A , a ∈ A and v = a . The degree ratio is r − λ , as there are thismany possible edges to add and the ﬂow uses one. Thus the overall cost of this stage is O (cid:18)(cid:114) nr (cid:114) nr √ n √ r (cid:19) = O (cid:18) n / √ r (cid:19) . Stage 6 (Load { a , a } ): We add one new edge between v and A . Thus the L -vertices at the end ofthis stage will be labeled by the union of three bipartite graphs between A , A and v, A as before, andadditionally between v, A of type { (1 , } , { (1 , , ( r − , } . Flow goes uniformly on those L -edges where { a , a } is added. Cost:

Again ﬂow is uniform as it was at the beginning of stage 5, the ﬂow out-degree was constant andthere was no merging. Each L -vertex has out degree r and the ﬂow out-degree is one. Thus we can againapply Lemma 3.The length of this stage is one. The vertex ratio is O (( n/r )( n/r ) n ( r /λ )) as ﬂow is present in a constantfraction of those L -vertices where a ∈ A , a ∈ A , v = a and { a , a } is present. The degree ratio is r .Thus the overall cost of this stage is O (cid:18)(cid:114) nr (cid:114) nr √ n (cid:114) r λ √ r (cid:19) = O (cid:18) n / √ λ (cid:19) . By choosing r = n / , r = n / , λ = n / we can make all costs, and thus their sum, O ( n / ).To quickly compute the stage costs, it is useful to associate to each stage a local cost and global cost . Thelocal cost is the product of the square root of the degree ratio and the length of a stage. The global cost isthe square root of the factor by which the stage increases the vertex ratio—we call this a global cost as it ispropagated from one stage to the next. Thus the square root of the vertex ratio at stage t will be given bythe product of the global costs of stages 1 , . . . , t −

1. As the cost of each stage is the product of the squareroot of the vertex ratio, square root of the degree ratio, and length, it can be computed by multiplying thelocal cost of the stage with the product of the global costs of all previous stages.Stage 1 2 3 4 5 6Global cost 1 (cid:112) n/r (cid:112) n/r √ n (cid:112) r /λ Local cost r r √ nr √ nr √ nλ √ r √ r Cost r r √ nr n √ r n / λ/ √ r r n / / √ r n / / √ λ Value n / n / n / n / n / n / In this section we develop a high-level language for designing algorithms to detect constant-sized subgraphs,and more generally to compute functions f : [ q ] n × n → { , } with constant-sized 1-certiﬁcate complexity.This high-level language consists of commands like “load a vertex” or “load an edge” that makes the algorithmeasy to understand. Our main theorem, Theorem 8, compiles this high-level language into a learning graphand bounds the complexity of the resulting quantum query algorithm. After the theorem is proven, we candesign quantum query algorithms using only the high-level language, without reference to learning graphs.9his saves the algorithm designer from having to make many repetitive arguments as in Section 3, and alsoallows computer search to ﬁnd the best algorithm within our framework. We now give an overview of our algorithmic framework and its implementation in learning graphs. We ﬁrstuse the framework for computing the function f H : [2]( n ) → { , } , which is by deﬁnition 1 if the undirected n -vertex input graph contains a copy of some ﬁxed k -vertex graph H = ([ k ] , E ( H )) as a subgraph. Thiscase contains all the essential ideas; after showing this, it will be easy to generalize the theorem in few moresteps to any function f : [ q ]( n ) → { , } or f : [ q ] n × n → { , } with constant-sized 1-certiﬁcate complexity.Fix a positive instance x , and vertices a , . . . , a k ∈ [ n ] constituting a copy of H in x , that is, suchthat x { a i ,a j } = 1 for all { i, j } ∈ E ( H ). Vertices of the learning graph will be labeled by k -partite graphswith color classes A , . . . , A k . The sets A , . . . , A k are allowed to overlap. Each L -vertex label will containan undirected bipartite graph G { i,j } = ( A min { i,j } , A max { i,j } , E { i,j } ) for every edge { i, j } ∈ E ( H ), where E { i,j } ⊆ A min { i,j } × A max { i,j } . For { i, j } ∈ E ( H ), by { a i , a j } we mean ( a i , a j ) if i < j , and ( a j , a i ) if j < i .For an edge { i, j } ∈ E ( H ), and u ∈ [ n ], the degree of u in G ij towards A j is the number of vertices in A j connected to u if u ∈ A i , and is 0 otherwise. The edges of these bipartite graphs deﬁne naturally the inputedges formally required in the deﬁnition of the learning graph: for u (cid:54) = v , both ( u, v ) and ( v, u ) deﬁne theinput edge { u, v } . We will disregard multiple input edges as well as self loops corresponding to edges ( u, u ).Observe that various L -vertex labels may correspond to the same set of input edges. For the ease of notationwe will denote G { i,j } by both G ij and G ji . We will use similar convention for E { i,j } which will be denotedby both E ij and E ji .Our high-level language consists of three types of commands. The ﬁrst is a setup command. This isimplemented by choosing sets A , . . . , A k ⊆ [ n ] of sizes r , . . . , r k and bipartite graphs G ij between A i and A j for all { i, j } ∈ E ( H ). Both the set sizes r , . . . , r k and the average degree of vertices in the bipartitegraph between A i and A j are parameters of the algorithm. The degree parameter d ij = d ji represents theaverage degree of vertices in the smaller of A i , A j towards the bigger one in G ij . It is deﬁned in this fashionso that it is always an integer and at least one—the average degree of the larger of A i , A j can be less thanone. Without loss of generality there is only one setup step and it happens at the beginning of the algorithm.The other commands allowed are to load a vertex a i and to load an edge { a i , a j } corresponding to { i, j } ∈ E ( H ) (this terminology was introduced by Belovs). There are two regimes for loading an edge. Oneis the dense case, where all vertices in the graph G ij have a neighbor; the other is the sparse case, wheresome vertices in the larger of A i , A j have no neighbors in the smaller. We need to separate these two casesas they apparently have diﬀerent costs (and cost analyses). The algorithm is deﬁned by a choice of set sizesand degree parameters, and a loading schedule giving the order in which the vertices and edges are loadedand which loads all edges of H .We now deﬁne the parameters specifying an algorithm more formally. Deﬁnition 7 (Admissible parameters) . Let H = ([ k ] , E ( H )) be a k -vertex graph, r , . . . , r k ∈ [ n ] be set sizeparameters, and d ij ∈ [ n ] for { i, j } ∈ E ( H ) be degree parameters. Then { r i } , { d ij } are admissible for H if • ≤ r i ≤ n/ for all i ∈ [ k ] , • ≤ d ij ≤ max { r i , r j } for all { i, j } ∈ E ( H ) , • for all i there exists j such that { i, j } ∈ E ( H ) and d ij (2 r j + 1) / (2 r i + 1) ≥ . We give a brief explanation of the purpose of each of these conditions. We will encounter terms of theform (cid:0) nr i (cid:1) / (cid:0) n − kr i (cid:1) that we wish to be O (1); this is ensured by the ﬁrst condition. As d ij represents the averagedegree of the vertices in the smaller of A i , A j towards the larger, the second condition states that this degreecannot be larger than the number of distinct possible neighbors. The third item ensures that the averagedegree of vertices in A i is at least one in the bipartite graph with some A j .10 eﬁnition 8 (Loading schedule) . Let H = ([ k ] , E ( H )) be a k -vertex graph with m edges. A loading schedulefor H is a sequence S = s s . . . s k + m whose elements s i ∈ [ k ] or s i ∈ E ( H ) are vertex labels or edge labelsof H such that an edge { i, j } only appears in S after i and j , and S contains all edges of H . Let VS t be theset of vertices in S before position t and similarly ES t the set of edges in S before position t . We can now state the main theorem of this section.

Theorem 8.

Let H = ([ k ] , E ( H )) be a k -vertex graph. Let r , . . . , r k , d ij be admissible parameters for H ,and S be a loading schedule for H . Then the quantum query complexity of determining if an n -vertex graphcontains H as a subgraph is at most a constant times the maximum of the following quantities: • Setup cost: (cid:88) { u,v }∈ E ( H ) min { r u , r v } d uv , • Cost of loading s t = i :  (cid:89) u ∈ VS t (cid:114) nr u (cid:89) { u,v }∈ ES t (cid:115) max { r u , r v } d uv  × √ n  (cid:88) j : { i,j }∈ E ( H ) r i ≤ r j d ij + (cid:88) j : { i,j }∈ E ( H ) r i >r j r j d ij r i  , • Cost of loading s t = { i, j } in the dense case where (2 min { r i , r j } + 1) d ij ≥ (2 max { r i , r j } + 1) :  (cid:89) u ∈ VS t (cid:114) nr u (cid:89) { u,v }∈ ES t (cid:115) max { r u , r v } d uv  max { r i , r j } , • Cost of loading s t = { i, j } in the sparse case where (2 min { r i , r j } + 1) d ij < (2 max { r i , r j } + 1) :  (cid:89) u ∈ VS t (cid:114) nr u (cid:89) { u,v }∈ ES t (cid:115) max { r u , r v } d uv  √ r i r j . If { i, j } is loaded in the dense case we call it a type 1 edge, and if it loaded in the sparse case we call ita type 2 edge. The costs of a stage given by Theorem 8 can again be understood more simply in terms oflocal costs and global costs. We give the local and global cost for each stage in the table below.Stage Global Cost Local CostSetup 1 (cid:80) { u,v }∈ H min { r u , r v } d uv Load vertex i (cid:112) n/r i √ n × total degree of i Load a type 1 edge { i, j } (cid:112) max { r i , r j } /d ij max { r i , r j } Load a type 2 edge { i, j } (cid:112) max { r i , r j } /d ij √ r i r j Proof.

We show the theorem by giving a learning graph of the stated complexity. Vertices of the learninggraph will be labeled by k -partite graphs with color classes A , . . . , A k of cardinality (of order) r , . . . , r k ∈ [ n ].The parameter d ij ≥ A i , A j towards the bigger in thebipartite graph G ij .The bipartite graph G ij , for each edge { i, j } ∈ E ( H ), will be speciﬁed by its type, that is by its degreesequences as given in Deﬁnition 2.We ﬁrst need to modify the set size parameters { r i } to satisfy a technical condition. Let r i ≤ · · · ≤ r i k be a listing in increasing order. We set r (cid:48) i = r i and r (cid:48) i t = Θ( r i t ) such that (2 r (cid:48) i t + 1) / (2 r (cid:48) i t − + 1) is an oddinteger. As a consequence, (2 max { r (cid:48) i , r (cid:48) j } + 1) / (2 min { r (cid:48) i , r (cid:48) j } + 1) is an odd integer, for every i (cid:54) = j . We nowsuppose this is done and drop the primes.Throughout the construction of the learning graph we will deal with two cases for the bipartite graphbetween A i and A j , depending on the size and degree parameters.11 ij vertices degree d ij degree d ij degree ij degree ij A i : 2 r i vertices A j : 2 r j vertices r i ij vertices r j d ij verticesdegree d ij degree ij d ij vertices new vertexdegree d ij d ij verticeswith degree d ij with degree ij with degree d ij with degree ij A j : 2 r j vertices A i with new vertex ! r i + 1 vertices r j d ij vertices r i ij vertices ij vertices new vertexdegree ij A i : 2 r i + 1 vertices A j with new vertex ! r j + 1 vertices r i ij + 1 vertices with degree ij r j vertices with degree d ij ij verticeswith degree d ij A i : 2 r i + 1 vertices A j : 2 r j + 1 vertices all of degree d ij all of degree ij per selected vertex r i selected vertices ⇡ r j /r i new edges neighbor ⇡ r j /r i vertices A i : 2 r i + 1 vertices A j : 2 r j + 1 vertices with degree d ij with degree ij with degree ij + 1 ⇡ r j /r i new edges ⇡ r j vertices r i vertices r i + 1 vertices ⇡ r j + 1 vertices fresh r i disjoint neighborhoods with degree ⇡ d ij + r j /r i neighborhood Figure 2: Example of a part of learning graph corresponding to Case 1 and restricted to the bipartite graphbetween A i and A j , where r i < r j . Observe that λ ij ≈ r i r j d ij . The loading schedule is ’setup’, ’load i ’, ’load j ’ and ’load { i, j } ’. 12 egree d ij degree degree A i : 2 r i vertices A j : 2 r j verticesdegree d ij r j r i d ij ) vertices r i d ij vertices new vertexdegree d ij with degree with degree A j : 2 r j vertices A i with new vertex ! r i + 1 vertices r j r i d ij ) vertices r i vertices with degree d ij (2 r i + 1) d ij vertices with degree with degree new vertex with degree A i : 2 r i + 1 vertices A j with new vertex ! r j + 1 verticesall of degree d ij with degree with degree A j : 2 r j + 1 vertices r i vertices r i vertex-disjoint new edges A i : 2 r i + 1 vertices all of degree d ij with degree with degree r i vertices with degree d ij + 1 with degree d ij r i vertices A i : 2 r i + 1 vertices new edge A j : 2 r j + 1 vertices (2 r i + 1) d ij vertices (2 r i + 1) d ij vertices r j (2 r i + 1) d ij vertices r j + 1 (2 r i + 1) d ij vertices (2 r i + 1) d ij + r i vertices r j + 1 (2 r i + 1) d ij r i vertices Figure 3: Similar to Figure 2, but for Case 2.13

Case 1 is where (2 min { r i , r j } + 1) d ij ≥ { r i , r j } + 1, which means that there are enough edgesfrom the smaller of A i , A j to cover the larger. We will say that the parameters for { i, j } are of type 1.In this case, we take d (cid:48) ij = Θ( d ij ) to be such that2 d (cid:48) ij + 1 = (2 λ ij + 1) 2 max { r i , r j } + 12 min { r i , r j } + 1 (2)for some integer λ ij . This can be done as (2 max { r i , r j } + 1) / (2 min { r i , r j } + 1) is an odd integer.In our construction, λ ij will be the average degree of the vertices in the larger of A i , A j towards thesmaller, which we want to be integer. We now consider this done and drop the primes. • Case 2 is where (2 min { r i , r j } + 1) d ij < { r i , r j } + 1. We will say that the parameters for { i, j } are of type 2. In this case, all degrees of vertices in the larger of A i , A j towards the smaller will beeither zero or one.Now we are ready to describe the learning graph. Figures 2 and 3 illustrate the evolution of a learninggraph for a subsequence ( i, j, { i, j } ) of some loading schedule, that is the sequence of instructions ‘setup’,‘load i ’, ‘load j ’ and ‘load { i, j } ’. The ﬁgures only represent the added edges between A i and A j , where r i < r j . Figure 2 corresponds to Case 1, and Figure 3 to Case 2.Recall that for every positive instance x , we ﬁxed a , . . . , a k ∈ [ n ] be such that x { a u ,a v } = 1 for all { u, v } ∈ E ( H ). During the construction we will specify for every edge { u, v } ∈ E ( H ), and for every stagenumber t , the correct degree cd( u, v, t ) which is the degree of a u in G ij towards a v in each L -vertex of V t +1 with positive ﬂow. Stage 0 (Setup):

For each edge { i, j } ∈ E ( H ) we setup a bipartite graph between A i and A j . The type ofthe bipartite graph depends on the type of the parameters for { i, j } . Let (cid:96) = min { r i , r j } and g = max { r i , r j } . • Case 1: Solving for λ ij in Equation (2) we get λ ij = ((2 (cid:96) + 1) d ij + (cid:96) − g ) / (2 g + 1). Intuitively, d ij represents the average degree of vertices in the smaller of A i , A j and λ ij the average degree in thelarger. Formally, the type of bipartite graph between A i , A j , with the listing of degrees for the smallerset given ﬁrst, is ( { (2 (cid:96) − λ ij , d ij ) , ( λ ij , d ij − } , { (2 g − d ij , λ ij ) , ( d ij , λ ij − } ). • Case 2: In this case the type of bipartite graph between A i and A j , with the listing of degrees for thesmaller set given ﬁrst, is ( { (2 (cid:96), d ij ) } , { (2 (cid:96)d ij , , (2 g − (cid:96)d ij , } ).The L -vertices at the end of stage 0 will be labeled by (possibly overlapping) sets A , . . . , A k of sizes r , . . . , r k and edges corresponding to a graph of the appropriate type between A i and A j for all { i, j } ∈ E ( H ). Flowgoes uniformly to those L -vertices where none of a , . . . , a k are in any of the sets A , . . . , A k . For all { u, v } ∈ E ( H ), we set cd( u, v,

0) = 0.

Stage t when s t = i : In this stage we load a i . The L -edges in this stage select a vertex v and add it to A i . For all j such that { i, j } ∈ E ( H ) we add the following edges: • Case 1: Say the parameters for { i, j } are of type 1. If r i ≤ r j , then v is connected to those vertices ofdegree λ ij − A j , and we set cd( i, j, t ) = d ij . Otherwise v is connected to those vertices of degree d ij − A j , and we set cd( i, j, t ) = λ ij . • Case 2: Say the parameters for { i, j } are of type 2. If r i ≤ r j then v is connected to d ij vertices ofdegree 0 in A j , and we set cd( i, j, t ) = d ij . Else no edges are added between v and A j , and we setcd( i, j, t ) = 0.For all other ( u, v ), we set cd( u, v, t ) = cd( u, v, t − L -edges where v = a i .14 tage t when s t = { i, j } : In this stage we load { a i , a j } . Again we break down according to the type ofthe parameters for { i, j } . Let (cid:96) = min { r i , r j } and g = max { r i , r j } . • Case 1: As both a i and a j have been loaded, between A i and A j there is a bipartite graph of type( { (2 (cid:96) + 1 , d ij ) } , { (2 g + 1 , λ ij ) } ), with the degree listing of the smaller set coming ﬁrst. If we simplyadded { a i , a j } at this step, a i and a j would be uniquely identiﬁable by their degree and blow up thecomplexity of later stages.To combat this, loading { a i , a j } will consist of two substages t.I and t.II . The ﬁrst substage is a hidingstep, done to reduce the complexity of having { a i , a j } loaded. Then we actually load { a i , a j } .Substage t.I : Let h = (2 g + 1) / (2 (cid:96) + 1). We select (cid:96) vertices in the smaller of A i , A j , and to each ofthese add h many neighbors. All neighbors chosen in this stage are distinct. Thus at the end of thisstage the type of bipartite graph between A i and A j is ( { ( (cid:96), d ij + h ) , ( (cid:96) + 1 , d ij ) } , { ( (cid:96) (2 g + 1) / (2 (cid:96) +1) , λ ij + 1) , ((2 g + 1)(1 − (cid:96)/ (2 (cid:96) + 1)) , λ ij ) } ). Flow goes uniformly along those L -edges where neither a i nor a j receive any new edges. For all { u, v } ∈ E ( H ), we set cd( u, v, t + 1) = cd( u, v, t ).Substage t.II : The L -edges in this substage select a vertex u in the smaller of A i , A j of degree d ij andadd h many neighbors of degree λ ij . Flow goes uniformly along those L -edges where u ∈ { a i , a j } and { a i , a j } is one of the edges added. Let s be the index of the smaller of the sets A i , A j , and let b theother index. We set cd( s, b, t + 1) = d ij + h, cd( b, s, t + 1) = λ ij + 1 and cd( u, v, t + 1) = cd( u, v, t ) for { u, v } (cid:54) = { i, j } . • Case 2: As both a i and a j have been loaded, there is a bipartite graph of type ( { (2 (cid:96) + 1 , d ij ) } , { ((2 (cid:96) +1) d ij , , (2 g + 1 − (2 (cid:96) + 1) d ij , } ). We again ﬁrst do a hiding step, and then add the edge { a i , a j } .Substage t.I : We select (cid:96) vertices in the smaller of A i , A j and to each add a single edge to a vertex ofdegree zero in the larger of A i , A j . Flow goes uniformly along those L -edges where no edges adjacentto a i , a j are added. For all { u, v } ∈ E ( H ), we set cd( u, v, t + 1) = cd( u, v, t ).Substage t.II : A single edge is added between a vertex in the smaller of A i , A j of degree d ij anda vertex in the larger of A i , A j of degree zero. Flow goes along those L -edges where { a i , a j } isadded. Let again s be the index of the smaller of the sets A i , A j , and let b the other index. Weset cd( s, b, t + 1) = d ij + 1 , cd( b, s, t + 1) = 1 and cd( u, v, t + 1) = cd( u, v, t ) for { u, v } (cid:54) = { i, j } .This completes the description of the learning graph. Complexity analysis

We will use Lemma 4 to evaluate the complexity of each stage. First we need toestablish the hypothesis of this lemma, which we will do using Lemma 5 and Lemma 6. Remember thatgiven σ ∈ S n , we deﬁned and denoted by σ the permutation over [ n ] × [ n ] such that σ ( i, j ) = ( σ ( i ) , σ ( j )).First of all let us observe that every σ ∈ S n is in the automorphism group of the function we are computing,since it maps a 1-certiﬁcate into a 1-certiﬁcate. As the ﬂow only depends on the 1-certiﬁcate graph, thisimplies that S n acts transitively on the ﬂows and therefore we obtain the conclusion of Lemma 5.Let V t stand for the L -vertices at the beginning of stage t . For a positive input x , and for an L -vertex P ∈ V t , we will denote the incoming ﬂow to P on x by p x ( P ) and the number of outgoing edgesfrom P with positive ﬂow on x by g + x ( P ). For an L -vertex R ∈ V t − we will denote by g − x,R ( P ) numberof incoming edges to P from L -vertices of the isomorphism type of R with positive ﬂow on x , that is g − x,R ( P ) = |{ τ ∈ S n : p x ( τ ( R ) , P ) (cid:54) = 0 }| . The crucial features of our learning graph construction are thefollowing: at every stage, for every L -vertex P and every σ ∈ S n , the L -vertex σ ( P ) is also present. Theoutgoing ﬂow from an L -vertex is always uniformly distributed among the edges getting ﬂow. The ﬂowdepends only on the vertices in the input containing a copy of the graph H , and therefore the values g + x ( P )and g − x,R ( P ), for p x ( P ) non-zero, depend only on the isomorphism types of P and R . Mathematically, thislast property translates to: for all t , for all P ∈ V t , for all R ∈ V t − , for all positive inputs x and y , for all σ ∈ S n , we have[ p x ( P ) (cid:54) = 0 and p y ( σ ( P )) (cid:54) = 0] = ⇒ [ g + x ( P ) = g + y ( σ ( P )) and g − x,R ( P ) = g − y,R ( σ ( P ))] . (3)15hich is exactly the hypothesis of Lemma 6.Now we have established the hypotheses of Lemma 4 and turn to evaluating the bound given there. Themain task is evaluating the maximum vertex ratio of each stage. The general way we will do this is toconsider an arbitrary vertex P of a stage. We then lower bound the probability that σ ( P ) is in the ﬂow fora positive input x and a random permutation σ ∈ S n , without using any particulars of P . This will thenupper bound the maximum vertex ratio. We use the notation P ∈ F x to denote that L -vertex P has at leastone incoming edge with ﬂow on input x . Lemma 9 (Maximum vertex ratio) . For any L -vertex P ∈ V t +1 and any positive input x Pr σ [ σ ( P ) ∈ F x ] = Ω  (cid:89) j ∈ VS t r j n (cid:89) ( u,v ) ∈ ES t d uv max { r u , r v }  . Proof.

We claim that an L -vertex P in V t +1 , that is at the end of stage t , has ﬂow if and only if ∀ i ∈ VS t , ∀{ i, j } ∈ ES t , we have a i ∈ A i and { a i , a j } ∈ E ij , (4) ∀ i ∈ [ k ] \ VS t , ∀{ i, j } ∈ H ( E ) \ ES t , we have a i (cid:54)∈ A i and { a i , a j } (cid:54)∈ E ij , (5) ∀{ i, j } ∈ ES t , the degree of a i in G ij towards A j is cd( i, j, t ) . (6)The only if part of the claim is obvious by the construction of the learning graph. The if part can be provenby induction on t . For t = 0 , the ﬁrst half (5) is exactly the one which deﬁnes the ﬂow for L -vertices in V .For the inductive step let us suppose ﬁrst that s t = i . Consider the label P (cid:48) by dropping the vertex a i from A i . Then in P (cid:48) every bipartite graph is of appropriate type for level t because of (6), and therefore P (cid:48) ∈ V t . It is easy to check that P (cid:48) also satisﬁes all three conditions, (for (6) we also have to use the secondhalf of (5): { a i , a j } (cid:54)∈ E ij ), and therefore has positive ﬂow. Since P (cid:48) is a predecessor of P is the learninggraph, P has also positive ﬂow.Now let us suppose that s t = { i, j } . In P the edge set E ij can be decomposed into the disjoint unionof E ∪ E , where E a bipartite graph of type ( { (2 (cid:96) + 1 , d ij ) } , { (2 g + 1 , λ ij ) } ) and E is of type ( { ( (cid:96) +1 , h ) , ( (cid:96), } , { (( (cid:96) + 1) h, , (2 g + 1 − ( (cid:96) + 1) h, } ), and (6) implies that { a i , a j } ∈ E . Consider the label P (cid:48) bydropping the edges of E from E ij . Again, P (cid:48) satisﬁes the inductive hypotheses, and therefore gets positiveﬂow, which implies the same for P .Suppose now that the L -vertex P is labeled by sets A , . . . , A k (some may be empty) and let the set ofedges between A i and A j be E ij . We want to lower bound the probability that σ ( P ) ∈ F x , meaning that σ ( P ) satisﬁes the above three conditions. Item (5) is always satisﬁed with constant probability; moreover,conditioned on item (5) the probability of the other events does not decrease. Thus we take this constantfactor loss and focus on the items (4), (6).We also claim that, conditioned on item (4) holding, item (6) holds with constant probability. This canbe seen as in the hiding step, in both case 1 and case 2, the probability that a i , a j have the correct degreegiven that they are loaded is at least 1 /

4. In the step of loading an edge, again in case 1 half the verticeson the left and right hand sides have the correct degree and so this probability is again 1 /

4; in case 2, giventhat the edge is loaded, whichever of a i , a j is in the larger set will automatically have the correct degree,and the other one will have correct degree with probability 1 /

2. Now we take this constant factor loss toobtain that Pr σ [ σ ( P ) ∈ F x ] is lower bounded by a constant factor times the probability that item (4) holds.The events in the ﬁrst condition are independent, except that for the edge { a i , a j } to be loaded thevertices a i and a j have to be also loaded. Thus we can lower bound the probability it is satisﬁed byPr σ [ σ ( P ) ∈ F x ] = Ω (cid:16) (cid:89) i ∈ V S t Pr σ [ a i ∈ σ ( A i )] × (cid:89) ( u,v ) ∈ ES t Pr σ [ { a i , a j } ∈ σ ( E ij ) | a i ∈ σ ( A i ) , a j ∈ σ ( A j )] (cid:17) Now Pr σ [ a i ∈ σ ( A i )] = Ω( r i /n ) as this fraction of permutations will put a i into a set of size r i . For theedges we use the following lemma. 16 emma 10. Let Y , Y ⊆ [ n ] be of size (cid:96), g respectively, and let ( y , y ) ∈ Y × Y . Let K be a bipartite graphbetween Y and Y of type { ( (cid:96), d ) } , { ( g, (cid:96)d/g ) } . Then Pr σ [ { y , y } ∈ σ ( K ) = d/g .Proof. Because of symmetry, this probability does not depend on the choice of { y , y } ; denote it by p . Let K , . . . , K c be an enumeration of all bipartite graphs isomorphic to K . We will count in two diﬀerent waysthe cardinality χ of the set { ( e, h ) : e ∈ K h } . Every K h contains (cid:96)d edges, therefore χ = c(cid:96)d . On the otherhand, every edge appears in pc graphs, therefore χ = (cid:96)gpc , and thus p = d/g .In our case, the graph G ij as in the hypothesis of the lemma plus some additional edges. By monotonicity,it follows that Pr σ [ { a i , a j } ∈ σ ( E ij ) | a i ∈ σ ( A i ) , a j ∈ σ ( A j )] = Ω( d ij / max { r i , r j } ) . This analysis is common to all the stages. Now we go through each type of stage in turn to evaluate thestage speciﬁc length and degree ratio.

Setup Cost:

The length of this stage is upper bounded by (cid:88) ( i,j ) ∈ E ( H ) min { r i , r j } d ij . We can upper bound the degree ratio by (cid:89) i ∈ [ k ] (cid:0) n r i (cid:1)(cid:0) n − k r i (cid:1) ≤ k = O (1)as r i < n/ Stage t when s t = i : In a stage loading a vertex the degree ratio is O ( n ) as there are n − r i possiblevertices to add yet only one is used by the ﬂow. The length of this stage is the total degree which is upperbounded by (cid:88) j : { i,j }∈ E ( H ) r i ≤ r j d ij + (cid:88) j : { i,j }∈ E ( H ) r i >r j r j d ij r i . Stage t when s t = { i, j } : Technically we should analyze the complexity of the two substages as twodistinct stages. However, as we will see, in both cases the degree ratio in the ﬁrst substage is O (1), andtherefore the local cost of this stage is just the maximum of the local cost of the two substages. Stage t.I : In Case 1, the length of this stage is O (max { r i , r j } ) and the degree ratio is constant. In Case 2,the length of this stage is O (min { r i , r j } ) and the degree ratio is constant. Stage t.II : In Case 1, the length is h = O (max { r i , r j } / min { r i , r j } ). The degree ratio is of order (cid:96) ( gh )( g − h − ) = O ( (cid:96) ). Thus the square root of the degree ratio times the length is of order max { r i , r j } .In Case 2, the length is one and the degree ratio is O ( r i r j ) as there are O ( r i r j ) many possible edges thatcould be added and the ﬂow uses one.Thus in Case 1 in both substages the product of the length and square root of degree ratio is O (max { r i , r j } ). In Case 2, substage II dominates the complexity where the product of the length andsquare root of degree ratio is O ( √ r i r j ). 17 .2 Extensions and basic properties We now extend Theorem 8 to the general case of computing a function f : [ q ] n × n → { , } with constant-sized 1-certiﬁcates. A certiﬁcate graph for such a function will be a directed graph possibly with self-loops.Between i and j there can be bidirectional edges, that is both ( i, j ) and ( j, i ) present in the certiﬁcate graph,but there will not be multiple edges between i and j , as there are no repetitions of indices in a certiﬁcate.We start oﬀ by modifying the algorithm of Theorem 8 to work for detecting directed graphs with possibleself-loops. To do this, the following transformation will be useful. Deﬁnition 9.

Let H be a directed graph, possibly with self-loops. The undirected version U ( H ) of H is asimple undirected graph formed by eliminating any self-loops in H , and making all edges of H undirected andsingle. Lemma 11.

Let H be a directed k -vertex graph, possibly with self loops. Then the quantum query complexityof detecting if an n -vertex directed graph G contains H as a subgraph is at most a constant times thecomplexity given in Theorem 8 of detecting U ( H ) in an n -vertex undirected graph.Proof. Let H be a directed k -vertex graph (possibly with self-loops) and H (cid:48) = U ( H ) be its undirectedversion. Let r , . . . , r k , d ij be admissible parameters for H (cid:48) , and S a loading schedule for H (cid:48) . Fix a directed n -vertex graph G containing H as a subgraph. Let a , . . . , a k be vertices of G such that ( a i , a j ) ∈ E ( G ) for( i, j ) ∈ E ( H ). We convert the algorithm for loading H (cid:48) in Theorem 8 into one for loading H of the samecomplexity.The setup step for H (cid:48) is modiﬁed as follows. In the bipartite graph between A i and A j , if both( i, j ) , ( j, i ) ∈ E ( H ) then all edges between A i and A j are directed in both directions; otherwise, if( i, j ) ∈ E ( H ) or ( j, i ) ∈ E ( H ) they are directed from A i to A j or vice versa, respectively. For everyself-loop in H , say ( i, i ) ∈ E ( H ), we add self-loops to the vertices in A i . Note that these modiﬁcations atmost double the number of edges added, and hence the cost, of the setup step.Loading a vertex: When loading a i we connect it as before, now orienting the edges according to ( i, j )or ( j, i ) in E ( H ), or both. If ( i, i ) ∈ E ( H ), then we add a self loop to a i . The only change in the complexityof this stage is again the length, which at most doubles. Notice that in the case of a self-loop we have alsoalready loaded the edge ( a i , a i ). We do not incur an extra cost for loading this edge, however, as the selfloop is loaded if and only if the vertex is.Loading an edge: Say that we are at the stage where s t = { i, j } ∈ E ( H (cid:48) ). If exactly one of ( i, j ) , ( j, i ) ∈ E ( H ) then this step happens exactly as before, except that the bipartite graph has edges directed from A i to A j or vice versa, respectively. If both ( i, j ) and ( j, i ) ∈ E ( H ), then in this step all edges added arebidirectional. This again at most doubles the length, and does not aﬀect the degree ﬂow probability as( a i , a j ) is loaded if and only if ( a j , a i ) is loaded as all edges are bidirectional. Lemma 12.

Let f : [ q ] n × n → { , } be a function such that all minimal -certiﬁcate graphs are isomorphicto a directed k -vertex graph H . Then the quantum query complexity of computing f is at most the complexityof detecting H in an n -vertex graph, as given by Lemma 11.Proof. We will show the theorem by giving a learning graph algorithm. Let G = ( V , E , S, w, { p y } ) be thelearning graph from Lemma 11 for H . All of V , E , S, w will remain the same in our learning graph G (cid:48) for f .We now describe the deﬁnition of the ﬂows in G (cid:48) .Consider a positive input x to f , and let α be a minimal 1-certiﬁcate for x such that the certiﬁcate graph H α is isomorphic to H . The ﬂow p x will be deﬁned as the ﬂow for H α (thought of as an n -vertex graph, thuswith n − k isolated vertices) in G , the learning graph for detecting H . This latter ﬂow has the property thatthe label of every terminal of ﬂow contains E ( H α ) and thus will also contain the index set of a 1-certiﬁcatefor x .The positive complexity of the learning graph for f will be the same as that for detecting H and thenegative complexity will be at most that as in the learning graph for detecting H , thus we conclude that thecomplexity of computing f is at most that for detecting H as given in Lemma 11.18 heorem 13. Say that the -certiﬁcate complexity of f : [ q ] n × n → { , } is at most a constant m , andlet H , . . . , H c be the set of graphs (on at most m edges) for which there is some positive input x such that H i is a minimal -certiﬁcate graph for x . Then the quantum query complexity of computing f is at most aconstant times the maximum of the complexities of detecting H i for i = 1 , . . . , c as given by Lemma 11.Proof. Consider learning graphs G , . . . , G c given by Lemma 11 for detecting H , . . . , H c respectively. Furthersuppose these learning graphs are normalized such that their negative and positive complexities are equal.We construct a learning graph G for f where the edges and vertices are given by connecting a new rootnode by an edge of weight one to the root nodes of each of G , . . . , G c . Thus the negative complexity of G isat most c (1 + max i C ( G i ).Now we construct the ﬂow for a positive input x . Let α be a minimal 1-certiﬁcate for x such that thecertiﬁcate graph H α is isomorphic to H i , for some i . Then the ﬂow on x is ﬁrst directed entirely to theroot node of G i . It is then deﬁned within G i as in Lemma 12. Thus the positive complexity of G is at most c (1 + max i C ( G i )).To make Theorem 8 and Lemma 11 easier to apply, here we establish some basic intuitive propertiesabout the complexity of the algorithm for diﬀerent subgraphs. Namely, we show that if H (cid:48) is a subgraphof H then the complexity given by Lemma 11 for detecting H (cid:48) is at most that of H . We show a similarstatement when H (cid:48) is a vertex contraction of H . Lemma 14.

Let H be a directed k -vertex graph (possibly with self-loops) and H (cid:48) a subgraph of H . Then thequantum query complexity of determining if an n -vertex graph G contains H (cid:48) is at most that of determiningif G contains H from Lemma 11.Proof. Assume that the vertices of H are labeled from [ k ] and that H (cid:48) is labeled such ( i, j ) ∈ E ( H ) for all( i, j ) ∈ E ( H (cid:48) ).The learning graph we use for detecting H (cid:48) is the same as that for H . For a graph G containing a H (cid:48) asa subgraph, let a , . . . , a k be such that ( a i , a j ) ∈ G for all ( i, j ) ∈ H (cid:48) . (If t is an isolated vertex in H (cid:48) , then a t can be chosen arbitrarily). The ﬂow for G is deﬁned in the same way as in the learning graph for H . Notethat once a , . . . , a k have been identiﬁed, the deﬁnition of ﬂow depends only edge slots—not on edges—thusthis deﬁnition remains valid for H (cid:48) . Furthermore all terminals of ﬂow are labeled by edge slots ( a i , a j ) forall ( i, j ) ∈ H , and so also contain the edge slots for H (cid:48) . Thus this is a valid ﬂow for detecting H (cid:48) . As thelearning graph and ﬂow are the same, the complexity will be as that given in Lemma 11. Lemma 15.

Let H be a k -vertex graph and H (cid:48) a vertex contraction of H . Then the quantum query complexityof detecting H (cid:48) is at most that of detecting H given in Lemma 11.Proof. Again we assume that the vertices of H are labeled from [ k ]. The key point is the following: if H (cid:48) is avertex contraction of H , then there are z , . . . , z k ∈ [ k ] (not necessarily distinct) such that ( z i , z j ) ∈ E ( H (cid:48) ) ifand only if ( i, j ) ∈ E ( H ). The learning graph for H (cid:48) will be the same as that for H except for the ﬂows. Fora graph G containing H (cid:48) , we choose vertices a , . . . , a k (not necessarily distinct) such that if ( z i , z j ) ∈ E ( H (cid:48) )then ( a i , a j ) ∈ E ( G ). As ( z i , z j ) ∈ E ( H (cid:48) ) if and only if ( i, j ) ∈ E ( H ), we can deﬁne the ﬂow as in Lemma 11for a , . . . , a k to load a copy of H (cid:48) . (Note that there is no restriction in the proof of that theorem that thesets A , . . . , A k be distinct). This gives an algorithm for detecting H (cid:48) with complexity at most that givenby Lemma 11 for detecting H . Consider an operation ◦ : S × S → S and let n = | S | . We wish to determine if ◦ is associative on S , meaningthat a ◦ ( b ◦ c ) = ( a ◦ b ) ◦ c for all a, b, c ∈ S . We are given black box access to ◦ , that is, we can make queriesof the form ( a, b ) and receive the answer a ◦ b . Theorem 16.

Let S be a set of size n and ◦ : S × S → S be an operation that can be accessed in black-boxfashion by queries ( a, b ) returning a ◦ b . There is a bounded-error quantum query algorithm to determine if ( ◦ , S ) is associative making O ( n / ) queries. b c b c a bb ca b ( a b ) ca ( b c ) Certiﬁcate () a ( b c ) = ( a b ) c a a a a a a a a a a a a a = a = a Certiﬁcate () ( a a = a , a a = a and a a = a a ) Figure 4: The 5-vertex certiﬁcate graph for associativity. Both pictures represent the same graph certiﬁcate H , where the second one has been labelled according to the notations of our abstract language. Proof. If ◦ is not associative, then there is a triple a , a , a such that a ◦ ( a ◦ a ) (cid:54) = ( a ◦ a ) ◦ a . Acertiﬁcate to the non-associativity of ◦ is given by a ◦ a = a , a ◦ a , a ◦ a = a , and a ◦ a such that a ◦ a (cid:54) = a ◦ a (see Figure 4). Note that not all of a , . . . , a need to be distinct.Let H be a directed graph on 5 vertices with directed edges (2 , , (2 , , (3 , , (5 , H or a vertex contraction of H , in the case that not all of a , . . . , a are distinct. By Lemma 15, the complexity of a detecting a vertex contraction of H is dominatedby that of detecting H , and so by Theorem 13 it suﬃces to show the theorem for H .We use the algorithmic framework of Theorem 8 to load the graph H . Let r = n/ , r = n / , r = n / , r = n / , r = 1 and d = n / , d = n / , d = n / , d = 1. Here d ij indicates the averagedegree of vertices in the smaller of A i , A j for edges directed from A i to A j . It can be checked that thisis an admissible set of parameters. Note that as r d << r , loading a ◦ a will be done in the sparseregime. We use the loading schedule S = [1 , , , , (2 , , (2 , , (3 , , , (5 , r d + r d + r d + r d = n / , and the costs of loading the vertices and edges are all bounded by n / as given in the following tables.Stage load a load a load a load a Global cost (cid:112) n/r (cid:112) n/r (cid:112) n/r (cid:112) n/r Local cost √ nr d /r √ n ( d + d ) √ n ( d + r d /r ) √ n ( r d /r + r d /r )Cost √ nr d /r n √ r ( d + d ) n / √ r r ( d + r d /r ) n √ r r r ( r d /r + r d /r )Value n / n / n / n / Stage load a ◦ a load a ◦ a load a ◦ a Global cost (cid:112) r /d (cid:112) r /d (cid:112) r /d Local cost r r r Cost n √ r r r √ r n √ r r d √ r r n √ r r d d r Value n / n / n / Stage load a load a ◦ a Global cost (cid:112) n/r Local cost √ nd √ r r Cost n / √ r r d d d √ r d n / √ r d d d √ r Value n / n / The algorithms for ﬁnding k -vertex subgraphs given in [13, 9] have complexity O ( n . ) for ﬁnding a4-path, but it was not realized there that these algorithms apply to a much broader class of functions likeassociativity. The key property that is used for this application is that in the basic learning graph modelthe complexity depends only on the index sets of 1-certiﬁcates and not on the underlying alphabet. Thisproperty was previously observed by Mario Szegedy in the context of limitations of the basic learning graphmodel [8]. He observed that the basic learning graph complexity of the threshold-2 function is Θ( n / ),rather than the true value Θ( √ n ), as threshold-2 and element distinctness have the same 1-certiﬁcate indexsets. 20 cknowledgements We would like to thank Aleksandrs Belovs for discussions and comments on an earlier draft of this work.

References [1] A. Belovs. Learning-graph-based quantum algorithm for k-distinctness. In

Prooceedings of 53rd AnnualIEEE Symposium on Foundations of Computer Science , 2012.[2] A. Belovs. Span programs for functions with constant-sized 1-certiﬁcates. In

Proceedings of 44th Sym-posium on Theory of Computing Conference , pages 77–84, 2012.[3] A. Belovs and T. Lee. Quantum algorithm for k-distinctness with prior knowledge on the input. TechnicalReport arXiv:1108.3022, arXiv, 2011.[4] S. D¨orn and T. Thierauf. The quantum complexity of group testing. In

Proceedings of the 34th conferenceon current trends in theory and practice of computer science , pages 506–518, 2008.[5] Lov K. Grover. A fast quantum mechanical algorithm for database search. In

Proceedings of 28th ACMSymposium on the Theory of Computing , pages 212–219, 1996.[6] P. Høyer and R. ˇSpalek. Lower bounds on quantum query complexity.

Bulletin of the European Asso-ciation for Theoretical Computer Science , 87, 2005. Also arXiv report quant-ph/0509153v1.[7] Peter Høyer, Troy Lee, and Robert ˇSpalek. Negative weights make adversaries stronger. In

Proceedingsof 39th ACM Symposium on Theory of Computing , pages 526–535, 2007.[8] R. Kothari. Personal Communication, 2011.[9] T. Lee, F. Magniez, and M. Santha. A learning graph based quantum query algorithm for ﬁndingconstant-size subgraphs. Technical Report arXiv:1109.5135, arXiv, 2011.[10] F. Magniez, M. Santha, and M. Szegedy. Quantum algorithms for the triangle problem.

SIAM Journalon Computing , 37(2):413–424, 2007.[11] Ben W. Reichardt. Reﬂections for quantum query algorithms. In

Proceedings of 22nd ACM-SIAMSymposium on Discrete Algorithms , pages 560–569, 2011.[12] P. Shor. Algorithms for quantum computation: Discrete logarithm and factoring.