Sketch-based Randomized Algorithms for Dynamic Graph Regression
aa r X i v : . [ c s . L G ] J un Sketch-based Randomized Algorithms forDynamic Graph Regression
Mostafa Haghir Chehreghani
Department of Computer Engineering and Information TechnologyAmirkabir University of TechnologyTehran, Iran [email protected]
June 6, 2019
Abstract
A well-known problem in data science and machine learning is lin-ear regression , which is recently extended to dynamic graphs. Existingexact algorithms for updating the solution of dynamic graph regres-sion problem require at least a linear time (in terms of n : the sizeof the graph). However, this time complexity might be intractable inpractice.In the current paper, we utilize subsampled randomized Hadamardtransform and CountSketch to propose the first randomized algorithms.Suppose that we are given an n × m matrix embedding M of thegraph, where m ≪ n . Let r be the number of samples required fora guaranteed approximation error, which is a sublinear function of n . Our first algorithm reduces time complexity of pre-processing to O ( n ( m + 1) + 2 n ( m + 1) log ( r + 1) + rm ). Then after an edge in-sertion or an edge deletion, it updates the approximate solution in O ( rm ) time. Our second algorithm reduces time complexity of pre-processing to O (cid:0) nnz ( M ) + m ǫ − log ( m/ǫ ) (cid:1) , where nnz ( M ) is thenumber of nonzero elements of M . Then after an edge insertion oran edge deletion or a node insertion or a node deletion, it updatesthe approximate solution in O ( qm ) time, with q = O (cid:16) m ǫ log ( m/ǫ ) (cid:17) .Finally, we show that under some assumptions, if ln n < ǫ − our first lgorithm outperforms our second algorithm and if ln n ≥ ǫ − oursecond algorithm outperforms our first algorithm. Keywords
Dynamic networks, subsampled randomized Hadamard trans-form,
CountSketch , dynamic graph regression, approximate algorithm, repre-sentation learning, sublinear update time.
One of the well-studied machine learning problems is linear regression , whichis traditionally defined as follows. We receive n data, where for each i ∈ [1 , n ],the data consists of a row in a matrix A and a single element in a vector b .Matrix A is called predictor values and b is called measured values . The goalis to find a vector x such that A · x is the closest point to b in the columnspan of A , under some distance measure, e.g., the Euclidean distance (whichis also called the least squares distance or the L argmin x || A · x − b || , or the equivalent problem: argmin x || A · x − b || . (1)There is a long history of research on the regression problem for staticmatrix data and graph data [2]. Very recently, the problem was extended to dynamic graphs , too [7]. Dynamic graphs are graphs that change over timeby a sequence of update operation. They are generated in many domainssuch as the world wide web, social and information networks, technologynetworks and communication networks. An update operation in a graphmight be either an edge insertion or an edge deletion or a node insertion ora node deletion.Given an n × m (update-efficient) matrix embedding of a graph G , the au-thor of [7] proposed an exact algorithm for dynamic graph regression, whereinfirst an O (min { nm + n m } ) time pre-processing is performed. Then afterany update operation in the graph, the solution is updated in O ( nm ) time.However, since in most of applications n is a very large quantity, this timecomplexity might be too high to be used in practice. Therefore, we are in-terested in developing algorithms that are considerably faster than the exact2lgorithm, in the expense of producing an approximate solution. In partic-ular, we want to develop algorithms that have a sublinear running time, interms of n .To do so, in the current paper we utilize two sketching techniques, namely subsampled randomized Hadamard transform [1] and CountSketch [8], to de-velop randomized algorithms for the dynamic graph regression problem. • Let r be a quantity that indicates the number of samples required fora guaranteed approximation error and is defined in Equations 6 and9 of Theorem 1. Our first randomized algorithm, which is based onsubsampled randomized Hadamard transform, reduces pre-processingtime complexity to O ( n ( m + 1) + 2 n ( m + 1) log ( r + 1) + rm ). Thenafter an edge insertion or an edge deletion, it updates the approximatesolution in O ( rm ) time. Note that since m is usually considerably lessthan n , we have: r ≪ n . Therefore, the improvements in the timecomplexities are considerable. • Let q = O (cid:16) m ǫ log ( m/ǫ ) (cid:17) be the number of samples required for aguaranteed approximation error ǫ , using CountSketch . Our second ran-domized algorithm uses
CountSketch and reduces time complexity ofpre-processing to O (cid:0) nnz ( M ) + m ǫ − log ( m/ǫ ) (cid:1) , where nnz ( M ) isthe number of nonzero elements of M . Then after an edge insertion oran edge deletion or a node insertion or a node deletion, it updates theapproximate solution in O ( qm ) time. As we will discuss later, we mayconsider m (and ǫ ) as constants. Therefore using CountSketch , we willhave a constant update time randomized algorithm.Note that subsampled randomized Hadamard transform and
CountSketch have already been used to improve regression in static data [1, 11, 5, 8].However, in this paper for the first time we show how they can be usedin a dynamic setting, where it is required to update the sketches and theapproximate solution, after an update operation in the data.While our both randomized algorithms considerably improve update timeupon the exact algorithm, we also analyze their relative performance. Weshow that under some assumptions, if ln n < ǫ − our first algorithm outper-forms our second algorithm and if ln n ≥ ǫ − our second algorithm has betterpre-processing and update time complexities.The rest of this paper is organized as follows. In Section 2, we presentpreliminaries and necessary background and definitions used in the paper. In3ection 3, we provide an overview on related work. In Section 4, we brieflyintroduce subsampled randomized Hadamard transform and CountSketch . InSection 5, we present our first randomized algorithm for the dynamic graphregression problem, which is based on subsampled randomized Hadamardtransform. In Section 6, we introduce our second randomized algorithm,which is based on
CountSketch . We discuss and compare our proposed algo-rithms in Section 7. Finally, the paper is concluded in Section 8.
In this paper, we use the following standard for notations and symbols: low-ercase letters for scalars, uppercase letters for constants and graphs, boldlowercase letters for vectors and bold uppercase letters for matrices. By G we refer to a graph that can be either directed or undirected. We assumethat G is an unweighted graph, without multi-edges. We use n to denote thenumber of nodes of G . We define a dynamic graph as a graph that changesover time by a sequence of update operations . The adjacency matrix of G isan square n × n matrix such that its ij th element is 1 iff there exists an edgefrom node i to node j (and 0 if there is no such an edge). We define the distance between node u and node v , denoted by dist ( u, v ), as the size (thenumber of edges) of a shortest path connecting u to v .Let A ∈ R n × m . The column rank (respectively row rank ) of A is thedimension of the column space (respectively row space) of A . Matrix A is full row rank iff each of its rows are linearly independent; and full columnrank iff each of its columns are linearly independent. For a square matrixthese two are equivalent and we say A is full rank iff its all rows and columnsare linearly independent. If n > m , A is full rank iff it is full column rank. If n < m , A is full rank iff it is full row rank. The transpose of A , denoted with A ∗ , is defined as an operator that switches the row and column indices of A .The inverse of A , denoted with A − , is an m × n matrix defined as follows: A − · A = A · A − = I , where I is an identical matrix. The Singular ValueDecomposition (SVD) of an n × m matrix A is defined as U · Σ · V ∗ , where U is an n × m matrix with orthonormal columns, Σ is an m × m diagonalmatrix with nonzero non-increasing entries down the diagonal, and V ∗ is an m × m matrix with orthonormal rows. The nonzero elements of Σ are calledsingular values of A . The Euclidean norm or L norm of a vector x of size n , denoted with || x || , is defined as p x + x + · · · + x n . The L norm of a4atrix is defined as it largest singular value.The Moore-Penrose pseudoinverse of matrix A = U · Σ · V ∗ , denoted with A † , is the m × n matrix V · Σ † · U ∗ , where Σ † is an m × m diagonal matrixdefined as follows: Σ † [ i, i ] = 1 / Σ [ i, i ], if Σ [ i, i ] > x is defined as: x † = x ∗ || x || .It is well-known that the solution x = A † · b (2)is an optimal solution for Equation 1 and it has minimum L argmin x ′ || A · x ′ − b || = (1 + ǫ ) argmin x || A · x − b || , (3)where x is the optimal solution, defined in Equation 2, and ǫ ∈ (0 ,
1) definesthe desired accuracy. As we will see in Section 4, subsampled randomizedHadamard transform can be used to solve this approximate version.
In recent years, a number of algorithms have been proposed for differentlearning problems over nodes of a graph. Kleinberg and Tardos [18] studiedthe classification problem for nodes of an static graph and showed the con-nection of their general formulation to Markov random fields. Herbster andPontil [16] studied the problem of online label prediction of a graph with theperceptron. The key difference between online setting [17, 14, 13, 15] and dynamic setting is that online setting is used when it is computationally in-feasible to solve the learning problem over the entire dataset. However, in dy-namic setting the learning problem can be solved over the entite dataset andthe challenge is to efficiently update the solution when the dataset changes.Culp, Michailidis and Johnson [9] presented representative multi-dimensionalview smoothers on graphs that are based on graph-based transductive learn-ing [25]. The authors of [4] proposed a family of learning algorithms basedon a new form of regularization so that some of transductive graph learningalgorithms can be obtained as special cases. Kovac and Smith [2] extendeda model for nonparametric regression of nodes of an static graph, where dis-tance between estimate and observation is measured at nodes by L norm,and roughness is penalized on edges in the L norm. The author of [7] studied5he regression problem over dynamic graphs. He proposed an exact algorithmfor updating the optimal solution of the problem, whose time complexity is(at least) linear in terms of the size of the graph. In the current paper, wepresent randomized algorithm for updating the approximate solution (withguaranteed error) that have sublinear time complexities.A research problem that may have some connection to our studied prob-lem is learning embeddings or representations for nodes or subgraphs of agraph [12], [24], [21]. While this problem has become more attractive in re-cent years, it dates back to several decades ago. For example, Parsons andPisanski [22] presented vector embeddings for nodes of a graph such that theinner product of the vector embeddings of any two nodes i and j is negativeiff i and j are connected by an edge; and it is 0 otherwise. In this section, we briefly describe subsampled randomized Hadamard trans-form and
CountSketch . Let A be an n × m matrix. A subsampled randomizedHadamard transform for A is defined as P · H · D , where • matrix D is a diagonal matrix with ± • matrix H is a Hadamard matrix, and • matrix P is a sampling matrix that samples r rows of P · H uniformlywith replacement. If row i is sampled in the j th sample, P [ i, j ] = √ n √ r ;otherwise, it is 0.For n = 2 k , the n × n Hadamard matrix H is defined as follows: H [ i, j ] = ( − h i,j i √ n , where h i, j i is the dot product of the binary representations of i and j over the field F . To emphasis k , we may write H in the form of H k .A CountSketch for the n × m matrix A is a k × n matrix S , k = O (cid:16) m ǫ (cid:17) , de-fined as follows: for every column, a single nonzero entry is chosen uniformlyat random, which takes values ± S is an sparse matrix which has only n nonzero elements. Moreover, S · A cancomputed in a time proportional to the number of nonzero elements of A [8].The high level paradigm of solving regression using sketching (either sub-sampled randomized Hadamard transform or CountSketch ) is as follows:6
Compute a sketching matrix S (either a P · H · D matrix or a CountS-ketch matrix), • Compute matrices S · A and S · b , • Compute and output the solution of the equation argmin x ′ || ( S · A ) · x ′ − S · b || . (4)The solution of Equation 4 is( S · A ) † · S · b , (5)which we call the approximate solution . When S is defined as a P · H · D matrix, Theorem 1 states the number of samples (the number of rows of P )that are sufficient for producing the approximate solution with the desirableaccuracy. Theorem 1 (Theorem 2 (and the remark afterwards) of [11]) . Suppose A ∈ R n × m , b ∈ R n , and let ǫ ∈ (0 , . If r = max (cid:8) m ln(40 nm ) ln (cid:0) m ln(40 nm ) (cid:1) , m ln(40 nm ) /ǫ (cid:9) , (6) with a probability at least . , we have: argmin x ′ || S · A · x ′ − S · b || ≤ (1 + ǫ ) argmin x || A · x − b || . (7) Time complexity of computing optimal x ′ (i.e., approximate solution) is n ( m + 1) + 2 n ( m + 1) log ( r + 1) + O ( rm ) . (8) In particular, assuming that m ≤ n ≤ e m , we get: r = O (cid:18) m ln m ln n + m ln nǫ (cid:19) (9) and the time complexity becomes: O (cid:18) nm ln mǫ + m ln m ln n + m ln nǫ (cid:19) . (10) Assuming that n ln n = Ω ( m ) , the above time complexity reduces to O (cid:18) nm ln mǫ + nm ln mǫ (cid:19) . S is defined as a CountSketch matrix, Theorem 2 states time com-plexity of this procedure that computes the approximate solution with thedesirable accuracy.
Theorem 2 (Theorem 30 of [8]) . Suppose that A ∈ R n × m , b ∈ R n and ǫ ∈ (0 , . Using a q × n CountSketch with q = O (cid:18) m ǫ log ( m/ǫ ) (cid:19) , (11) the linear regression problem over A and b can be solved up to a (1+ ǫ ) -factorwith probability at least / in O (cid:0) nnz ( A ) + m ǫ − log ( m/ǫ ) (cid:1) (12) time, where nnz ( A ) is the number of nonzero elements of A . In this section, we exploit subsampled randomized Hadamard transform toimprove time complexity of dynamic graph regression, at the cost of having anapproximate solution with an error guarantee. We here restrict ourselves tothe following update operations: i) edge deletion , wherein an edge is deletedfrom the graph, and ii) edge insertion , wherein an edge is inserted betweentwo nodes of the graph. We refer to these operations as edge-related updateoperations. The reason that in this section we do not consider node insertionand node deletion is that as we will see later, they require to change (the sizeof) the used Hadamard matrix H , which requires Θ ( n ) time ( n is the numberof nodes of the graph). Hence and since we are looking for algorithms thathave a sublinear update time, we do not consider these two operations. Moreover, a property of real-world graphs is densification [19], i.e., their number ofedges grows superlinearly in the number of their nodes. Therefore, we may say that mostof update operations in a dynamic graph are related to edges, rather than to nodes. As aresult, proposing algorithms that are efficient for edge-related update operations is usefuland worthwhile. For node insertions/deletions, we may compute the solution from thescratch, whose time complexity is not much worse than linear in n (see Equation 13 ofCorollary 1). H (respectively graph G )must have a power of 2 rows/columns (respectively nodes), ii) the matrix em-bedding M must have full rank. For now on, we forget these two limitations.We get back to them in Section 5.2.We assume that the graph G has an edge-update-efficient matrix embed-ding M , and we define the regression problem with respect to it. Moreprecisely, we want to compute and update ( S · M ) † · S · b , where M is edge-update-efficient . Edge-update-efficient matrix embeddings are a superset of update-efficient matrix embeddings presented in [7]. The class of update-efficient embeddings characterizes those matrix embeddings for which theoptimal solution of the graph regression problem can be updated efficiently[7]. For example, adjacency matrix of G belongs to this class. Edge-update-efficient matrix embeddings, defined in Definition 1, characterize those ma-trix embeddings for which the approximate solution can be updated effi-ciently, when the updated operation is edge-related . Definition 1.
Let M be an n × m matrix embedding of a graph G and f be acomplexity function. We say M is f - edge-update-efficient , iff it satisfies thefollowing condition. If M and M ′ are the correct matrix embeddings beforeand after one of the edge-related update operations, there exist at most K pairs of vectors c k and d k , with K as a constant, such that: M ′ = M + K X k =1 (cid:16) c k · d k ∗ (cid:17) , and each vector c k has only one nonzero element (whose position is known).We refer to each pair c k and d k as a pair of update vectors , and to P Kk =1 (cid:0) c k · d k ∗ (cid:1) as the update matrix . Also, it is feasible to compute all the pairs of updatevectors in O ( f ) time. When function f is clear from the context or when it does not have animportant role, we drop it and simplify use the term edge-update-efficient . Itis clear that any update-efficient embedding is also an edge-update-efficient embedding.At the high level, our first randomized algorithm consists of two phases:the pre-processing phase wherein we assume that we are given a static graphand we find an approximate solution for it, and the update phase, whereinafter an edge-related update operation in G , the already found approximate9olution is revised to become valid for the new graph. During pre-processing,first we generate some matrices P , H and D , as defined in Section 4. Thenwe calculate M ′ = P · H · D · M . Then, we compute b ′ = P · H · D · b .Then, we compute M ′† and finally, we compute M ′† · b ′ . Time complexity ofthe algorithm is stated in Theorem 1. In the following, first in Section 5.1 wediscuss how the approximate solution can be updated, after an edge-relatedoperation. Then, in Section 5.2 we discuss how the limitations of the usedtechnique can be addressed. All the presented proofs are constructive. In this section, we assume that the update operation is an edge-related op-eration and show that the approximate solution, i.e., the value depicted inEquation 5, can be updated in O ( rm ) time. Here, we condition on the ex-istence of an edge-update-efficient matrix embedding, without emphasizingany specific one. In Section 5.2, we show that this condition holds. Theorem 3.
Let M be an n × m edge-update-efficient matrix embeddingof graph G . Suppose that using an r × n subsampled randomized Hadamardtransform S , an approximate solution of dynamic graph regression of G isalready computed. Then, after either an edge insertion or an edge deletion,the approximate solution can be updated in O ( rm ) .Proof. After one of the above-mentioned update operations, by the edge-update-efficient property of M , M can be updated by at most K pairs ofupdate vectors for the revised graph. Given these (at most) K pairs of updatevectors and ( S · M ) † of the graph before the update operation, we want tocompute ( S · M ) † of the revised graph. Since the number of columns and thenumber of rows of M do not change, the sketching matrix S does not change,too. We have a sequence of at most K rank- M k + = M k + c k · d k ∗ ,1 ≤ k < K , where c k and d k are a pair of update vectors, M = M and M K is the correct matrix embedding of G after the update operation. Aftereach rank- M k + = M k + c k · d k ∗ , • given the matrix S · M k , we first compute S · c k · d k ∗ and then, wecompute S · M k + by computing the matrix summation S · M k + S · c k · d k ∗ . Note that S · c k · d k ∗ can be computed in O ( rm ) time, asfollows. First, we compute S · c k which is indeed just choosing the i th column of S , where i is the nonzero entry of c k . The result is a vector10 k of size r . Second, we compute the vector product s k · d k ∗ , whichcan be done in O ( rm ) time. • then, we exploit the algorithm of Meyer[6] that given an n × n matrix A and its Moore-Penrose pseudoinverse A † and a pair of update vectors c and d , computes the Moore-Penrose pseudoinverse of ( A + c · d ∗ ), in O ( n n ) time. Here, our matrix A is S · M which is an r × m matrix,therefore updating ( S · M ) † for a given pair of update vectors will take O ( rm ) time.Therefore and after repeating this procedure for at most K times, we cancompute the Moore-Penrose pseudoinverse of S · M for the updated graph in O ( Krm ) = O ( rm ) time. In the end, multiplication of the updated ( S · M ) † and ( S · b ) yields the approximate solution of the updated graph, which canbe done in O ( rm ) time. The first well-know intrinsic limitation of randomized Hadamard transformis that the number of rows in M , i.e., n , must be a power of 2. This impliesthat we should always have a power of 2 nodes in the graph. When apply-ing randomized Hadamard transform to matrices, this issue is addressed byconcatenating a zero matrix to the main matrix that makes its size a powerof 2 [20, 10]. We can follow a similar strategy for graphs. More precisely,if during pre-processing the number of rows of M is less than a power of 2,we pad it with zeros up to the next larger power of 2. This might be seenas adding isolated nodes to the graph (with measured values 0), to make itssize a power of 2. The second intrinsic limitation of randomized Hadamardtransform is that M must be a full (column) rank matrix. However, thisis not a serious problem for real-world applications, as most of generatedmatrices have a full rank (specially since m ≪ n ).The next restriction is that the n × m matrix embedding M must satisfytwo properties. First, m ≪ n , because otherwise, randomized Hadamardtransform will not be efficient. Second, it must be edge-update-efficient . Inthe following, first in Definition 2, we present a matrix embedding definedbased on m closest nodes of each node, where m can be arbitrarily small (weconsider it as a small constant). So it satisfies the first property needed for M . Then in Theorem 4, we prove that it is an edge-update-efficient matrix11mbedding. For the sake of simplicity, we assume that G is an undirectedgraph. However, the results can be easily extended to directed graphs. Definition 2.
For each node v in a graph G , we define its vector embeddingas a vector consisting of m nodes of G that have the smallest distances to v ,and call it the m -nearest neighborhood of v . If there are several such subsetsof V ( G ) , we choose an arbitrary one. We define matrix embedding M of G as an n × m matrix whose i th row is the vector embedding of the i th node. Lemma 1.
If node u is reachable from node v (i.e., there is a path from v to u ) but their distance is larger than m , u cannot be in m -nearest neighborhoodof v .Proof. If u and v are connected by a path but dist ( u, v ) > m , there existat least m nodes in the graph, such that their distances to v are less than dist ( u, v ). Therefore, u is not in m -nearest neighborhood of v . Lemma 2.
If an edge is added between nodes u and v of a graph G , vectorembeddings of at most O ( m m ) nodes in G may change. Furthermore, eachvector embedding that must be revised, can be updated in O ( m ) time.Proof. First, we determine those nodes that after adding an edge between u and v , may have a change in their m -neighborhood. Let Q denote the set ofsuch nodes. Nodes u and v belong to Q . Also, those nodes that have alreadynode u (resp. node v ) in their m -nearest neighborhood, after inserting anedge between u and v , may also find v (resp. u ) and some other nodes in their m -nearest neighborhood. Lets focus on finding those nodes that have already u in their m -nearest neighborhood, and may have v in their m -neighborhoodafter the edge insertion (finding those nodes that may have u in their m -neighborhood after the edge insertion can be done in a similar way). To doso, we conduct a breadth-first search (BFS) from v on the updated graph.We use the following pruning/stopping criterion’s: • at the first level, among all neighbors of v , we only meet u . The reasonis that we are interested in finding those nodes that have a shortestpath to v passing over u . • in other levels, if a node x has a degree greater than m , v cannot be inthe m -nearest neighborhood of any of its adjacent nodes (and also anynode y such that x is on a shortest path between y and v ). Becausethe adjacent nodes of x have already at least m nodes that are closerto them than v . 12 if a node x has a distance greater than m from v , as Lemma 1 says, v cannot be in its m -nearest neighborhood. Furthermore, any node y such that v is on a shortest path from x to y cannot be in the m -nearest neighborhood of x . Hence, those nodes that have a distancegreater than m from v should not be traversed during the BFS.As a result and in the end of the traversal, all the met nodes have a degreeat most m and a distance to v at most m . The number of such nodes is atmost O ( m m ).Second, for each node whose vector embedding may require an update,we conduct a BFS on its first m nodes to compute its updated embedding.This can be done in O ( m ) time. Lemma 3.
If the edge between nodes u and v of a graph G is deleted, vec-tor embeddings of at most O ( m m ) nodes change. Furthermore, each vectorembedding that should be revised, can be updated in O ( m ) time.Proof. Our proof is similar to the proof of Lemma 2. First, we determinethose nodes that after deleting the edge between u and v , may have a changein their neighborhood. Let Q denote the set of such nodes. Nodes u and v belong to Q . Also, those nodes that have already node u (resp. node v ) in their m -nearest neighborhood, after deleting the edge between u and v , may also loose v (resp. u ) and some other nodes from their m -nearestneighborhood. Lets focus on finding those nodes that have already u intheir m -nearest neighborhood, and may loose v and some other nodes fromtheir m -neighborhood (finding those nodes that may loose u from their m -neighborhood can be done in a similar way). We conduct a breadth-firstsearch from v on the graph before edge deletion. We use the three prun-ing/stopping criterion’s used in the proof of Lemma 2. In the end of thetraversal, all the met nodes have a degree at most m and a distance to v atmost m . The number of such nodes is at most O ( m m ).Second, for each node whose embedding may require an update, we con-duct a BFS on its first m nodes in the updated graph, to compute its updatedembedding. This can be done in O ( m ) time. Theorem 4.
Assuming that m is a constant, the matrix embedding M de-fined in Definition 2 is an O (1) - edge-update-efficient matrix embedding.Proof. We show that M satisfies the conditions stated in Definition 1. Whenan edge is inserted/deleted between nodes i and j , as Lemmas 2 and 3 say,13ector embeddings of at most O ( m m ) nodes change and it take O ( m ) timeto update each vector embedding. Since m is a small constant, we canconsider m m +1 as a constant K . The new vector embedding of each node v whose vector embedding has been changed can be expressed in terms of apair c and d of update vectors, where d contains the new vector embeddingof node v and c is a vector whose v th element is 1 and its other elementsare 0. Therefore, the conditions of Definition 1 are satisfied and M is an O (1)- edge-update-efficient matrix embedding. Corollary 1.
Suppose that we are given a graph G whose matrix embeddingis defined as Definition 2, with m as a small constant, and it is a full col-umn rank matrix. Our randomized algorithm, which is based on subsampledrandomized Hadamard transform, performs the pre-processing phase in O (cid:18) n log (cid:18) ln n ln ln n + ln nǫ (cid:19) + ln n ln ln n + ln nǫ (cid:19) (13) time. Then, after any edge-related update operation, it updates the approxi-mate solution of the dynamic graph regression problem in O (cid:18) ln n ln ln n + ln nǫ (cid:19) . (14) time.Proof. In Theorem 3, we conditioned on the existence of an edge-update-efficient embedding and showed that it takes O ( rm ) time to update theapproximate solution. Then in Theorem 4, we showed that this matrix em-bedding does exist. Therefore and by using the value of r presented inTheorem 1 and discarding constants (including m ), we obtain the time com-plexities stated in the theorem.We note that if the exact algorithm of [7] uses the matrix embeddingpresented in Definition 2, it will yield a linear time algorithm (in terms of n ) for updating the solution, which is considerably worse than the sublinearupdate time presented in Equation 14.14 A randomized algorithm based on
CountS-ketch
In this section, we exploit
CountSketch to develop our second randomizedalgorithm for the dynamic graph regression problem. Unlike our first algo-rithm presented in Section 5, it works for all the update operations: i) nodeinsertion , wherein a node is inserted into the graph and at most a constantnumber of edges are drawn between it and the existing nodes of the graph, ii) node deletion , wherein a node that has at most a constant number of edges,is deleted from the graph and its incident edges are deleted, too, iii) edgedeletion wherein an edge is deleted from the graph, and iv) edge insertion wherein an edge is inserted into the graph.We assume that an n × m matrix embedding exists which satisfies thefollowing conditions: i) m is fixed and does not depend on the number of datarows n (as a result, by changing the number of data rows, m does not change),and ii) the matrix embedding is CUE . CUE characterizes a class of matrixembeddings for which we can efficiently update the approximate solution ofthe graph regression problem, using CountSketch . It is more general than update-efficient matrix embeddings presented in [7] for updating the exactsolution of dynamic graph regression. However, it is less general then edge-update-efficient matrix embeddings presented in Section 5, which can be usedfor only edge-related operations.
Definition 3.
Let M be an n × m matrix embedding of a graph G and f bea (complexity) function of n and m . We say M is f - CUE , iff the followingconditions are satisfied:1. if M and M ′ are correct matrix embeddings before and after an edgeinsertion/deletion in the graph, there exist at most K pairs of vectors c k and d k , with K as a constant, such that: M ′ = M + K X k =1 (cid:16) c k · d k ∗ (cid:17) , and each vector c k has only one nonzero element (whose position isknown). We refer to each pair c k and d k as a pair of update vectors ,and to P Kk =1 (cid:0) c k · d k ∗ (cid:1) as the update matrix . CUE is abbreviation for C ountSketch based U pdate E fficient matrix embedding. . a node insertion in G results in adding one column and/or one row to M and also (at most) a rank- K update matrix in M .3. deleting a node from G results in deleting one column and/or one rowfrom M and also (at most) a rank- K update matrix in M .4. after any update operation in G , it is feasible to compute all the pairsof update vectors in O ( f ( n, m )) time.Sometimes and when f is clear from the context or it is not important, wedrop it and use the term CUE . Similar to the case of subsampled Hadamard randomized transform, dur-ing the pre-processing phase of our
CountSketch -based algorithm and for agiven ǫ , first we generate a q × m matrix S , with q = O ( m /ǫ ), as defined inSection 4. Then we calculate M ′ = S · M and b ′ = S · b . Finally, we compute M ′† and M ′† · b ′ . Time complexity of the procedure is given in Theorem 2.In the following, first in Section 6.1 we discuss how the approximate solutionis updated, after an update operation. Then, in Section 6.2 we discuss theexistence of a CUE matrix embedding.
In this section, we assume that we are given a matrix M that satisfies thetwo above mentioned conditions and show, using CountSketch , how the ap-proximate solution is efficiently updated after an update operation.
In this section, we assume that the update operation is either an edge inser-tion or an edge deletion. Then, we show that the approximate solution canbe updated in O ( qm ) time. Theorem 5.
Assume that M is an n × m CUE matrix embedding of graph G .Suppose also that using a q × n CountSketch S with q defined in Equation 11,an approximate solution of dynamic graph regression for G is already com-puted. Then, after an edge insertion or an edge deletion, the approximatesolution can be updated in O ( qm ) time. roof. The proof is similar to the proof of Theorem 3. Since M is a CUE matrix embedding, after an edge insertion or an edge deletion, M is updatedby at most K pairs of update vectors. Since the number of columns of M does not change, matrix S does not change, too. Therefore, we have asequence of at most K rank- M k + = M k + c k · d k ∗ , 1 ≤ k < K ,where c k and d k are a pair of update vectors, M = M and M K is thecorrect matrix embedding of G after the update operation. After each rank- M k + = M k + c k · d k ∗ , given the matrix S · M k , similar to the proofof Theorem 3, we can compute S · M k + in O ( qm ) time. Then we can useMeyer’s algorithm [6] to update ( S · M ) † , for a given pair of update vectors,in O ( qm ) time.After repeating this procedure for at most K times, we can compute theMoore-Penrose pseudoinverse of S · M for the updated graph in O ( qm ) time.Finally, multiplication of the updated ( S · M ) † and ( S · b ) can generate, in O ( qm ) time, the approximate solution. In this section, we assume that the update operation is a node insertion andshow, in Theorem 6, how the approximate solution is effectively updated.
Theorem 6.
Let M be an n × m CUE matrix embedding of graph G . Sup-pose that using a q × n CountSketch S with q defined in Equation 11, anapproximate solution of dynamic graph regression of G is already computed.Then, after inserting a node into G , the approximate solution can be updatedin O ( qm ) time.Proof. After inserting a node into the graph, we need to revise the matrices S and M . Matrix M is revised because we need to add to M the rowcorresponding to the new vertex. Matrix S is revised because its numberof columns is a function of the number of rows of M . Therefore and as aresult of a node insertion, we add a new column to S and we choose a rowuniformly at random as its nonzero row. Let i be the index of this nonzerorow. To update S · M with respect to this change, we add to each entry j ofthe i th row of S · M the value of the j th entry of the last row of M . This canbe done in O ( m ) time. Furthermore, by the CUE property of M , as a resultof this node insertion, the vector embeddings of the other nodes change byat most K pairs of update vectors. Since q and m do not change, the size ofmatrix S · M does not change, too. Updating S · M with respect to these17t most K pairs of update vectors can be done in O ( qm ) time (as describedin the proofs of Theorems 3 and 5).To update ( S · M ) † with respect to the changes in S · M , we can exploitthe algorithm of Meyer [6] that given an n × m matrix A and its Moore-Penrose pseudoinverse A † and a pair of update vectors c and d , computesMoore-Penrose pseudoinverse of ( A + c · d ∗ ) in O ( nm ) time. Therefore, sincethe changes in i th row of S · M can be expressed in terms of a pair of updatevectors, ( S · M ) † can be updated with respect to them in O ( qm ) time.Furthermore, for each of (at most) K pairs of update vectors, we can use thealgorithm of Meyer [6] to update ( S · M ) † in O ( qm ) time.After node insertion, we need also to append the measured value of thenew node to the bottom of b and then, update S · b (with respect to therevised S ). To update S · b , it is sufficient to add the measured value of thenew node to the i th entry of S · b ( i is the nonzero row of the new column ofthe updated S ). In the end, a naive multiplication of the updated ( S · M ) † and the updated S · b gives the approximate solution of the updated graphand it can be done in O ( qm ) time. In this section, we assume that the update operation is node deletion , andshow, in Theorem 7, how the approximate solution is effectively updated.
Theorem 7.
Let M be an n × m CUE matrix embedding of graph G . Sup-pose that using a q × n CountSketch S with q defined in Equation 11, anapproximate solution of dynamic graph regression of G is already computed.Then, after deleting a node from G , the approximate solution can be updatedin O ( qm ) time.Proof. After deleting a node from the graph, we need to revise matrices S and M . Matrix M is revised because we need to delete from it the rowcorresponding to the deleted node. Matrix S is revised because we shoulddelete from it the the column corresponding to the deleted node. Let i be theindex of this nonzero row. To update S · M with respect to these changes,we subtract from each entry j of the i th row of S · M the value of M [ q, j ].This can be done in O ( m ) time. Furthermore, by the CUE property of M , asa result of this node deletion, the vector embeddings of the other nodes maychange by at most K pairs of update vectors. Matrix S · M can be updated18ith respect to these changes in O ( qm ) time. Since q and m do not change,the size of matrix S · M does not change, too.To update ( S · M ) † with respect to these changes in S · M , we can againexploit the algorithm of Meyer [6]. Therefore, since the changes in i th row of S · M can be expressed in terms of a pair of update vectors, ( S · M ) † canbe updated with respect to them in O ( qm ) time. Also, for each of at most K pairs of update vectors, we can use the algorithm of Meyer [6] to update( S · M ) † in O ( qm ) time.After node deletion, we need also to delete the measured value of thedeleted node from b and then, update S · b . To update S · b , it is sufficientto subtract the measured value of the deleted node from the the i th entryof S · b , where i is the nonzero row of the deleted column. In the end, anaive multiplication of the updated ( S · M ) † and the updated S · b yieldsthe approximate solution of the updated graph and it can be done in O ( qm )time. CUE matrix embedding
In this section, we show that the m -nearest neighborhood vector embeddingpresented in Section 5.2 satisfies all the conditions we are looking for. Firstof all, in this embedding m is a small constant and does not depend on n .Second, in Theorem 8 we show that it is CUE . Theorem 8.
Assuming that m is a constant, the matrix embedding M de-fined in Definition 2 of Section 5.2 is CUE .Proof.
We shall show that M satisfies all the conditions stated in Defini-tion 3.1. When an edge is inserted/deleted between nodes i and j , in a waysimilar to the proof of Theorem 4, we can show that condition (1) ofDefinition 3 is satisfied.2. When a new node i is added to G , we add a new row for it in M ,which contains its m closest neighbors. Furthermore, since at most aconstant number C of edges are added between i and existing nodes in More than these two conditions and similar to our first randomized algorithm, here ourmatrix embedding must be a full (column) rank matrix. However, as mentioned before,real-world matrices are usually full column rank (specially when m ≪ n ). and each edge insertion may change vector embeddings of at most O ( m m ) nodes, vector embeddings of at most O ( Cm m ) nodes change,which can be considered as a constant K . Therefore and similar to theprevious case, condition (2) of Definition 1 is satisfied.3. When we delete a node from G , we delete its corresponding row from M . Furthermore, since the deleted node may have at most a constantnumber C of edges (that are deleted too), and each edge deletion maychange vector embeddings of at most O ( m m ) nodes, vector embeddingsof at most O ( Cm m ) nodes change, which is a constant K . Hence andsimilar to the previous case, condition (3) of Definition 1 is satisfied.4. For all the update operations, each of the pair of the update vectors c and d can be computed in O ( m ) time. As a result, condition (4) ofDefinition 1 is satisfied. Corollary 2.
Suppose that we are given a graph G whose matrix embeddingis defined as Definition 2, with m as a constant, and it is a full column rankmatrix. Using a CountSketch as the sketching matrix, we can perform thepre-processing phase in O (cid:0) n + ǫ − log (1 /ǫ ) (cid:1) (15) time. Then, after a node insertion or a node deletion or an edge insertionor an edge deletion, we can update the approximate solution of the dynamicgraph regression problem in O (cid:0) ǫ log (1 /ǫ ) (cid:1) time.Proof. In Theorem 5, we conditioned on the existence of a
CUE matrix em-bedding and showed that it takes O ( qm ) time to update the approximatesolution. Then in Theorem 4, we showed the existence of this matrix embed-ding. As a result and by replacing q with its value defined in Equation 11and discarding all constants (including m ), we obtain the time complexitiesstated in the theorem.As already mentioned, if in the exact algorithm of [7] we use the matrixembedding presented in Definition 2, it will yield a linear time algorithm (interms of n ) for updating the solution, which is much worse than the constantupdate time presented in Corollary 2.20 Discussion
In Sections 5.2 and 6.2 and after presenting Corollaries 1 and 2, we discussedthat the exact algorithm of [7] has a worse update time than our proposedrandomized algorithms. However, we shall also compare the two randomizedalgorithms against each other. In the following, we compare update andpre-processing time complexities of the randomized algorithms. • Suppose that our two algorithms use m -nearest neighborhood matrixembedding and we discard the terms ln ln n and log (1 /ǫ ) from the up-date time complexities (because of having terms such as ln n and ǫ − ).Under these assumptions, update time complexities of the first andsecond algorithms become O (cid:0) ln nǫ (cid:1) and O ( ǫ − ), respectively. Hence, ifln ≥ ǫ − , the second algorithm finds an smaller update time, other-wise the first algorithm outperforms the second algorithm in terms ofupdate time.Note that in the general form and without relaying on any specific ma-trix embedding, our first algorithm updates the approximate solutionin a sublinear time in terms of n (Theorem 3 of Section 5.1). However,when we use CountSketch , update time becomes independent of n (The-orems 5, 6 and 7 of Section 6.1). In particular, if we consider m and ǫ asconstants, while update time of our first algorithm becomes a sublinearfunction of n , our second algorithm updates the approximate solutionin a constant time. As a result and in addition to the nice sparsity property of CountSketch [8], its another interesting property, revealedin this paper, is its constant update time for all the update operations(node insertion, node deletion, edge insertion and edge deletion). • Similar to the case of update times, we may simplify pre-processingtime complexities by assuming that our two algorithms use m -nearestneighborhood matrix embedding. Moreover, the terms log log n andlog (1 /ǫ ) are discarded from Equations 13 and 15. Then, pre-processingtime complexities of the first and second algorithms become O (cid:0) n + ln nǫ (cid:1) and O ( n + ǫ − ), respectively. Therefore and similar to the case of up-date times, if ln n ≥ ǫ − , the second algorithm finds an smaller pre-processing time, otherwise the first algorithm outperforms the secondalgorithm in terms of pre-processing time.21 Conclusion
In this paper, we utilized subsampled randomized Hadamard transform and
CountSketch to propose randomized algorithms for the dynamic graph regres-sion problem. Suppose that we are given an n × m matrix embedding M ofthe graph, where m ≪ n . Our first randomized algorithm reduces time com-plexity of pre-processing to O ( n ( m +1)+2 n ( m +1) log ( r +1)+ rm ), where r is the number of samples required for a guaranteed approximation error and itis a sublinear function of n . Then after an edge insertion or an edge deletion,it updates the approximate solution in O ( rm ) time. Our second algorithm re-duces time complexity of pre-processing to O (cid:0) nnz ( M ) + m ǫ − log ( m/ǫ ) (cid:1) ,where nnz ( M ) is the number of nonzero elements of M . Then after an edgeinsertion or an edge deletion or a node insertion or a node deletion, it updatesthe approximate solution in O ( qm ) time, with q = O (cid:16) m ǫ log ( m/ǫ ) (cid:17) . In theend, we analyzed the relative performance of the algorithms and showed that(under some assumptions), for ln n < ǫ − our first algorithm outperforms oursecond algorithm and for ln n ≥ ǫ − our second algorithm shows better pre-processing and update times. References [1] Nir Ailon and Edo Liberty. Fast dimension reduction using rademacherseries on dual BCH codes. In Shang-Hua Teng, editor,
Proceedings ofthe Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms,SODA 2008, San Francisco, California, USA, January 20-22, 2008 ,pages 1–9. SIAM, 2008.[2] Kovac Arne and Andrew D A C Smith. Nonparametric regression ona graph.
Journal of Computational and Graphical Statistics , 20(2):432–447, 6 2011. Publisher: American Statistical Association.[3] Maria-Florina Balcan and Kilian Q. Weinberger, editors.
Proceedings ofthe 33nd International Conference on Machine Learning, ICML 2016,New York City, NY, USA, June 19-24, 2016 , volume 48 of
JMLR Work-shop and Conference Proceedings . JMLR.org, 2016.224] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regular-ization: A geometric framework for learning from labeled and unlabeledexamples.
J. Mach. Learn. Res. , 7:2399–2434, 2006.[5] C. Boutsidis and A. Gittens. Improved matrix algorithms via the sub-sampled randomized hadamard transform.
SIAM Journal on MatrixAnalysis and Applications , 34(3):1301–1340, 2013.[6] Jr Carl D. Meyer. Generalized inversion of modified matrices.
SIAMJournal on Applied Mathematics , 24(3):315–323, 1973.[7] Mostafa Haghir Chehreghani. On the theory of dynamic graph regressionproblem.
CoRR , abs/1903.10699, 2019.[8] Kenneth L. Clarkson and David P. Woodruff. Low-rank approximationand regression in input sparsity time.
J. ACM , 63(6):54:1–54:45, January2017.[9] Mark Culp, George Michailidis, and Kjell Johnson. On multi-view learn-ing with additive models.
The Annals of Applied Statistics , 3(1):292–318,2009.[10] T. Ceren Deveci, Serdar C¸ akir, and A. Enis C¸ etin. Energy efficienthadamard neural networks.
CoRR , abs/1805.05421, 2018.[11] Petros Drineas, Michael W. Mahoney, S. Muthukrishnan, and Tam´asSarl´os. Faster least squares approximation.
Numerische Mathematik ,117(2):219–249, Feb 2011.[12] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learn-ing for networks. In Balaji Krishnapuram, Mohak Shah, Alexander J.Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi, editors,
Proceedings of the 22nd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, San Francisco, CA, USA, Au-gust 13-17, 2016 , pages 855–864. ACM, 2016.[13] Mark Herbster and Guy Lever. Predicting the labelling of a graph viaminimum $p$-seminorm interpolation. In
COLT 2009 - The 22nd Con-ference on Learning Theory, Montreal, Quebec, Canada, June 18-21,2009 , 2009. 2314] Mark Herbster, Guy Lever, and Massimiliano Pontil. Online predic-tion on large diameter graphs. In Daphne Koller, Dale Schuurmans,Yoshua Bengio, and L´eon Bottou, editors,
Advances in Neural Infor-mation Processing Systems 21, Proceedings of the Twenty-Second An-nual Conference on Neural Information Processing Systems, Vancouver,British Columbia, Canada, December 8-11, 2008 , pages 649–656. CurranAssociates, Inc., 2008.[15] Mark Herbster, Stephen Pasteris, and Massimiliano Pontil. Predictinga switching sequence of graph labelings.
Journal of Machine LearningResearch , 16:2003–2022, 2015.[16] Mark Herbster and Massimiliano Pontil. Prediction on a graph with aperceptron. In Bernhard Sch¨olkopf, John C. Platt, and Thomas Hof-mann, editors,
Advances in Neural Information Processing Systems 19,Proceedings of the Twentieth Annual Conference on Neural InformationProcessing Systems, Vancouver, British Columbia, Canada, December4-7, 2006 , pages 577–584. MIT Press, 2006.[17] Mark Herbster, Massimiliano Pontil, and Lisa Wainer. Online learningover graphs. In Luc De Raedt and Stefan Wrobel, editors,
MachineLearning, Proceedings of the Twenty-Second International Conference(ICML 2005), Bonn, Germany, August 7-11, 2005 , volume 119 of
ACMInternational Conference Proceeding Series , pages 305–312. ACM, 2005.[18] Jon M. Kleinberg and ´Eva Tardos. Approximation algorithms for classifi-cation problems with pairwise relationships: metric labeling and markovrandom fields.
J. ACM , 49(5):616–639, 2002.[19] Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. Graph evo-lution: Densification and shrinking diameters.
ACM Transactions onKnowledge Discovery from Data (TKDD) , 1(1), 2007.[20] Yichao Lu, Paramveer S. Dhillon, Dean P. Foster, and Lyle H. Ungar.Faster ridge regression via the subsampled randomized hadamard trans-form. In Christopher J. C. Burges, L´eon Bottou, Zoubin Ghahramani,and Kilian Q. Weinberger, editors,
Advances in Neural Information Pro-cessing Systems 26: 27th Annual Conference on Neural InformationProcessing Systems 2013. Proceedings of a meeting held December 5-8,2013, Lake Tahoe, Nevada, United States. , pages 369–377, 2013.2421] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learningconvolutional neural networks for graphs. In Balcan and Weinberger [3],pages 2014–2023.[22] T.D. Parsons and Tomaz Pisanski. Vector representations of graphs.
Discrete Mathematics , 78(1):143 – 154, 1989. Special Double Issue inMemory of Tory Parsons.[23] C. R. Rao and S. K. Mitra.