A continuation method for computing the multilinear Pagerank
AA continuation method for computingthe multilinear Pagerank
Alberto Bucci ∗ and Federico Poloni † Abstract
The multilinear Pagerank model [Gleich, Lim and Yu, 2015] is a tensor-based gener-alization of the Pagerank model. Its computation requires solving a system of poly-nomial equations that contains a parameter α ∈ [0 , . For α ≈ , this computationremains a challenging problem, especially since the solution may be non-unique. Ex-trapolation strategies that start from smaller values of α and ‘follow’ the solution byslowly increasing this parameter have been suggested; however, there are known caseswhere these strategies fail, because a globally continuous solution curve cannot bedefined as a function of α . In this paper, we improve on this idea, by employing apredictor-corrector continuation algorithm based on a more general representation ofthe solutions as a curve in R n +1 . We prove several global properties of this curve thatensure the good behavior of the algorithm, and we show in our numerical experimentsthat this method is significantly more reliable than the existing alternatives. Set x ⊗ m = x ⊗ x ⊗ · · · ⊗ x (cid:124) (cid:123)(cid:122) (cid:125) m times , where ⊗ denotes the Kronecker product [7, Section 11.4], and e = [1 , . . . , T ∈ R n , the vector of all ones. Let R ∈ R n × n m be a non-negative matrix such that e T R = ( e T ) ⊗ m , and v ∈ R n be a stochastic vector, i.e., a non-negative vectorwith e T v = 1 . The Multilinear Pagerank problem consists in finding a stochasticsolution x ∈ R n to the equation x = f α ( x ) , f α ( x ) = αR ( x ⊗ m ) + (1 − α ) v, (1)for a certain m ∈ N , m ≥ and a given value α ∈ (0 , .Equation (1) is a generalization of the equation behind the well-known Pager-ank model [13] (to which it reduces for m = 1 ), and has been introduced in [5]as a simplified version of a higher-order Markov chain model with memory. Theequation itself has a probabilistic interpretation that was suggested in [2]. ∗ M.Sc. graduate at the University of Pisa. † University of Pisa, Department of Computer Science. [email protected] . FP ispartially supported by INDAM/GNCS and by the University of Pisa’s projects PRA_2017_05“Modelli ed algoritmi innovativi per problemi strutturati e sparsi di grandi dimensioni” andPRA_2020_61 “Analisi di reti complesse: dalla teoria alle applicazioni”. a r X i v : . [ m a t h . NA ] F e b Introduction Problem (1) can be reduced to the computation of Z-eigenvalues of ten-sors [10, 12, 14], so some theory and algorithms for that problem can also beapplied here.Various algorithms have been suggested to compute solutions of this equa-tion; see e.g. [3, 5, 11]. Among the simplest choices we have the fixed-pointiteration x k +1 = f α ( x k ) , k = 0 , , , . . . , or the Newton–Raphson method on the function f α ( x ) − x , i.e., x k +1 = x k − ( αP x − I ) − ( f α ( x k ) − x k ) , k = 0 , , , . . . . (2)with αP x the Jacobian matrix of f α .It is generally recognized that the problem is easier for small α , especiallyfor α ≤ m . For values of α approaching , its numerical solution is morecomplicated, and the solution may be non-unique. Sufficient conditions for theuniqueness of solutions have been proposed in literature [4, 9]; the simplest (butweakest) of them is α ≤ m .In view of this property, various of the algorithms proposed fall in the settingof extrapolation methods, where the a sequence of solutions x (1) , x (2) , x (3) , . . . associated to increasing values of α (1) < α (2) < α (3) < α ( k ) is computed, andused to obtain an initial guess for a further solution with parameter α ( k +1) >α ( k ) . In this way, one can start by solving problems in the ‘easier regime’ α ≈ m ,and at each step use the previously computed values of x to get a sufficientlyaccurate initial value.However, as acknowledged in [11], this idea can fail spectacularly, becausein some cases the solution of the problem is not given by a continuous function x ( α ) . An example where this is well visible is obtained with the × example R6_3 in the dataset of [5] for α = 0 . : see Figure 1 for a plot that shows thebehaviour of one entry of x with respect to α . When one attempts to computesolutions for increasing values of α , they end up tracking the solution withthe largest value of x in the figure. When one first surpasses α = α , thissolution is then worthless as an initial value, and the method basically needsto restart without a useful initial guess. In addition, according to the informaldescription in [5, Fig. 9], for α slightly larger than α the common algorithms areslowed down by the presence of a so-called pseudo-solution ; they spend manyiterations in which the iterates wander about in the set of stochastic vectorsbefore convergence kicks in.In this paper, we suggest an improved strategy to deal with these problematicexamples. We consider the zero set of H ( x, α ) := f α ( x ) − x = αR ( x ⊗ m ) + (1 − α ) v − x, H : R n × R → R n as a curve in R n +1 , and use extrapolation techniques in the family of contin-uation algorithms [1] to compute points following this curve. More precisely,the algorithms we use belong to the family of predictor-corrector continuationalgorithms.In cases such as the one in Figure 1, the curve makes an S-bend , and wecannot consider it anymore (locally) as a parametrized curve with the coordi-nate α as a parameter: instead, we compute points following the curve, using Introduction Fig. 1:
The first entry x of all solution(s) vs. the parameter α for the ex-ample R6_3 in [5]. There are three distinct stochastic solutions for α ∈ [ α , α ] ≈ [0 . , . , and one solution in the rest of [0 , . Structure of solutions (implicitly) an arc-length parametrization. Hence, once it reaches α , our algo-rithm reduces the value of α tracking the bottom part of the curve in Figure 1,reaches α again and then increases α a second time.We give a theoretical contribution, proving that there is indeed a connectedsolution curve that allows us to track the solution correctly up to α = 1 inproblems without singular points, and then we present a practical algorithmthat allows us to solve multilinear Pagerank problem with higher reliabilitythan the existing methods.The paper is structured as follows. In Section 2, we recall various resultson the properties of solutions, focusing on uniqueness in particular. The resultsare stated for generic m , unlike other references which focus on m = 2 . InSection 3, we focus more closely on the geometrical structure of the solutionset as a curve in R n +1 . We continue with a general introduction on predictor-corrector methods in Section 4, and then we present the specific variant of themethod that we use for this problem with all its algorithmic details in Section 5.In Section 6 we present numerical experiments that prove the effectiveness ofthis method, and we end with some conclusions in Section 7. We recall various results on the structure of the solution set of (1). Most ofthese results appear already in previous works (see e.g. [11]), mostly for m = 2 ,but we present them here for completeness in the case of a general m . Theorem 1.
Let x ∈ R n a nonnegative solution of equation (1) with α ∈ (0; 1) ,then e T x = 1 or e T x = c α , where c α is the other positive solution (besides z = 1 )of the equation g ( z ) = 0 , with g ( z ) := αz m − z + (1 − α ) , and satisfies c α > if α < m ,c α = 1 if α = m ,c α < if α > m . (3) Proof.
Let x be a nonnegative solution of (1). Multiplying on the left by e T ,we get e T x = αe T R ( x ⊗ m ) + (1 − α ) e T v = α ( e T ) ⊗ m x ⊗ m + (1 − α ) = α ( e T x ) m + (1 − α ) . Hence g ( e T x ) = 0 . It remains to prove that the only positive solutions of g ( z ) = 0 are and another one c α satisfying (3).By studying the sign of g (cid:48) ( z ) , one sees that g ( z ) has a global minimum z ∗ = m − (cid:114) αm , for which z ∗ > if α < m ,z ∗ = 1 if α = m ,z ∗ < if α > m . In particular, g ( z ∗ ) ≤ g (1) = 0 . The function g ( z ) is decreasing in (0 , z ∗ ) andincreasing in ( z ∗ , ∞ ) , hence (unless z ∗ = 1 ) it has two intersections with the x -axis, one in (0 , z ∗ ) and one in ( z ∗ , ∞ ) . Structure of solutions One can show that solutions with e T x = 1 and e T x = c α do indeed exist.For any c ≥ , let Z c := { x ∈ R n : e T x = c, x ≥ } , (4)the set of non-negative vectors with fixed entry sum c (in particular, Z is theset of stochastic vectors). Theorem 2.
Equation 1 has at least one solution in Z c α , and one in Z .Proof. The set Z c α is a simplex, and in particular it is compact and convex.Moreover, f α ( Z c α ) ⊂ Z c α ; indeed ∀ y ∈ Z c α e T f α ( y ) = αe T R ( y ⊗ m ) + (1 − α ) e T v = α ( e T y ) m + 1 − α = αc mα + 1 − α = c α , where we have used the fact that g ( c α ) = 0 in the last equality.Hence by the Brouwer fixed-point theorem [8] we can conclude that f has afixed point in Z c α , and this fixed point is a solution of (1).The same proof works after replacing c α with , since the only property thatwe have used is that g ( c α ) = 0 .Moreover, there is a unique solution which achieves the smaller among thepossible two values of e T x . This unique solution is called minimal solution in [11] and other literature on similar matrix equations. Theorem 3.
For each fixed α ∈ [0 , , Equation (1) has a unique solution in Z min(1 ,c α ) .Proof. Consider the iteration x = 0 , x k +1 = f α ( x k ) , k = 0 , , , . . . . (5)We can show by induction that x k +1 ≥ x k : the base step k = 0 is obvious, and x k +1 − x k = f α ( x k ) − f α ( x k − ) = αR ( x ⊗ mk − x ⊗ mk − ) ≥ . Let y be a solution of (1) with e T y = min(1 , c α ) . Similarly, we can show byinduction that x k ≤ y . The base step k = 0 is again obvious, and y − x k +1 = f α ( y ) − f α ( x k ) = αR ( y ⊗ m − x ⊗ mk ) ≥ . So the sequence is weakly increasing and bounded, hence it converges to a limit x ∞ := lim k →∞ x k .Passing to the limit, we see that x ∞ is a solution of (1), hence e T x ∞ ≥ min(1 , c α ) = e T y. Again passing to the limit, we see that x ∞ ≤ y . These two properties togetherimply that y = x ∞ . Thus we have proved that y = x ∞ for each solution y ∈ Z min(1 ,c α ) .The following observation seems to be novel. The curve of solutions Theorem 4.
Let v be a strictly positive vector and let ≤ α < .Any nonnegative solution of (1) is strictly positive.Proof. We have x i = e Ti x = αe Ti R ( x ⊗ m ) + (1 − α ) e Ti v = αR i ( x ⊗ m ) + (1 − α ) v i > , since αR i ( x ⊗ m ) and (1 − α ) v i > . Corollary . In the previous setting, x ∈ (0 , n . Proof.
Since every component of x is strictly positive and the vector is stochasticevery component must also be less then 1. In this section, we argue that the stochastic solutions of (1) form a smooth curve(under some regularity assumptions).We denote with P x ∈ R n × n the Jacobian matrix of Rx ⊗ m with respect to x ,i.e., P x = R ∂x ⊗ m ∂x = R ( I ⊗ x ⊗ · · · ⊗ x + x ⊗ I ⊗ x ⊗ · · · ⊗ x + · · · + x ⊗ · · · ⊗ x ⊗ I ) . (6)The Jacobian matrix of H ( x, α ) is J H [ x, α ] = (cid:2) αP x − I R ( x ⊗ m ) − v (cid:3) ∈ R n × ( n +1) , (7)where the first n × n block column ∂H∂x [ x, α ] = P x − I contains derivatives withrespect to x and the second contains derivatives with respect to α .We let ∆ c := { x ∈ R n : e T x = c } . Note that this definition differs from that of Z c in that we do not require that x ≥ .We start from a result proving the surjectivity of ∂H∂x [ x, α ] under some as-sumptions. Lemma 5.
Let ( x, α ) be such that H ( x, α ) = 0 , and x ∈ Z .1. For α < m , the matrix ∂H∂x [ x, α ] = αP x − I has full rank n .2. For α = m , the matrix ∂H∂x [ x, α ] = αP x − I has rank n − , and e T is theunique row vector (up to multiples) such that e T ∂H∂x [ x, α ] .3. For α = m , the matrix ∂H∂x [ x, α ] = αP x − I is surjective as a linear mapfrom ∆ to ∆ .Proof. We prove the three points.
The curve of solutions
1. Note that the matrix P x is nonnegative, and that e T P x = ( e T ) ⊗ m ( I ⊗ x ⊗ · · · ⊗ x + x ⊗ I ⊗ x ⊗ · · · ⊗ x + · · · + x ⊗ · · · ⊗ x ⊗ I )= e T ⊗ e T x ⊗ · · · ⊗ e T x + e T x ⊗ e T ⊗ e T x ⊗ · · · ⊗ e T x + · · · + e T x ⊗ · · · ⊗ e T x ⊗ e T = m ( e T x ) m − e T . (8)Hence e T αP x < e T when α < m and e T x = 1 . Thus αP x is a sub-stochastic matrix, and αP x − I is invertible by the Perron-Frobenius the-orem [7, Section 10.4].2. Again thanks to the relation (8), the matrix αP x is (column-)stochasticwhen α = m and e T x = 1 ; hence e T ( αP x − I ) = 0 . It remains to provethat the rank of m P x − I is exactly n − . Suppose, on the contrary, thatit is n − or lower. Hence the eigenvalue 1 has geometric multiplicity atleast in the column-stochastic matrix m P x . It follows from the Perron-Frobenius theory of stochastic matrices (see e.g. [7, Section 10.4, Fact1(g)]) that m P x has at least two distinct ergodic classes: that is, there existtwo disjoint subsets of indices E , E such that one can write (reorderingthe matrix entries and setting E = ( E ∪ E ) c ) P x = E E E (cid:34) (cid:35) E P E E P E E E P E E P E E E P E E , with m P E E and m P E E irreducible and stochastic.Expanding the formula (6), one sees that the entries of P x are computedaccording to P i,j = n (cid:88) k ,k ,k m − =1 (cid:0) R i, ( j,k ,k ,...,k m − ) + R i, ( k ,j,k ,...,k m − ) + . . . + R i, ( k ,k ,...,k m − ,j ) (cid:1) x k x k · · · x k m − , where we have used tuples ( j , j , . . . , j m ) ∈ { , , . . . , n } m as indices forthe columns of R . Let us consider a tuple ( j , j , . . . , j m ) such that j ∈ E and j ∈ E . Since x > and R ≥ , we must have R i, ( j ,j ,...,j m ) = 0 foreach i ∈ E , because otherwise the entry P i,j in P E E would be strictlypositive. Analogously, R i, ( j ,j ,...,j m ) = 0 for each i ∈ E ∪ E , sinceotherwise the entry P i,j in P E E or P E E would be strictly positive. Itfollows that R i, ( j ,j ,...,j m ) = 0 for all i . However, this implies ( e T R ) ( j ,j ,...,j m ) = n (cid:88) i =1 R i, ( j ,j ,...,j m ) = 0 , which contradicts e T R = ( e T ) ⊗ m .3. It is sufficient to prove that the map w (cid:55)→ ( αP x − I ) w has trivial kernelin ∆ , i.e., ( αP x − I ) w (cid:54) = 0 for all w ∈ ∆ \ { } . The curve of solutions By the Perron–Frobenius theorem, there is a nonnegative vector z (cid:54) = 0 such that αP x z = z , i.e., ( αP x − I ) z = 0 . Since we know that αP x − I hasrank n − , it follows that ker( αP x − I ) = span( z ) . Since z is nonnegative, span( z ) ∩ ∆ = { } , hence αP x − I has trivial kernel in ∆ . Theorem 6.
Let S = { ( x, α ) | H ( x, α ) = 0 , x ∈ Z } and S I = { ( x, α ) ∈ S | α ∈ I } .1. S [0 , m ] is a smooth curve (with boundary).If moreover J H [ x, α ] is surjective as a linear map from ∆ × R to ∆ for eachpoint ( x, α ) ∈ S with α ≤ , then we have that2. For some ε > , S [ − ε, ε ) is union of disjoint smooth curves (with bound-ary).3. The curve S ∗ containing S [0 , m ] reaches the hyperplane { α = 1 } .Proof. We prove the three points.1. From lemma 1.1 we know that ∂H∂x [ x, α ] is surjective for ≤ α ≤ m ; sincesurjectivity of the Jacobian is an open condition, there exists ε > suchthat J H [ x, α ] is surjective for each point ( x, α ) with − ε < α < ε .This allows us to apply the Regular Level Set Theorem [15, Section 9.3]to H (cid:12)(cid:12) ∆ × ( − ε, m + ε ) : ∆ × (cid:18) − ε, m + ε (cid:19) → ∆ , obtaining that H − (cid:12)(cid:12) ∆ × [ , m ](0) is union of smooth curves with boundary.By theorem 3, for each α ∈ (cid:2) , m (cid:3) there is only one solution, hence thereis only one curve in the union and it coincides with S [0 , m ] .2. We argue as in the previous point: since surjectivity of the Jacobian isan open condition, there exists ε > such that J H [ x, α ] is surjective foreach point ( x, α ) with α < ε . We can now apply the Regular LevelSet Theorem to the whole ∆ × [0 , ε ) .3. Let C denote the hypercube (cid:8) ( x, α ) ∈ R n × [0 , (cid:12)(cid:12) ≤ x i ≤ (cid:9) . The inter-section C ∩ S ∗ is a connected compact 1-manifold with boundary ∂C ∩ S ∗ ;such a manifold is diffeomorphic to S or to a closed segment.It cannot be diffeomorphic to S since otherwise there would be two dif-ferent branches of S ∗ starting from ( v, (the solution at α = 0 ), contra-dicting the uniqueness of the solution for α < m .Being diffeomorphic to a segment, the intersection has two boundarypoints that must have α = 0 or α = 1 by Corollary 1. Since there isonly one solution with α = 0 , there must be one with α = 1 . Remark.
We could not find an explicit example in which J H [ x, α ] is not surjec-tive as a linear map from ∆ × R to ∆ for each point ( x, α ) ∈ S with α ≤ .We conjecture that this condition always hold, but we do not have a proof. Numerical continuation methods Numerical continuation methods [1] are a family of numerical methods to tracesolution curves defined as the solution set of a system of nonlinear equations ofcodimension 1, i.e., G ( y ) = 0 , where G : R n +1 → R n .These methods are often used to compute solutions in a precise region of thecurve following the implicit defined curve numerically: one computes an initialsolution in a part of the curve where it is easier, and then uses it as an initialstep to compute iteratively a sequence of points on the curve, each close to theprevious one, with the goal of arriving to a solution in a different part of thecurve.We focus on prediction-correction methods [1, Chapter 2], which are a classof continuation methods structured in two parts:1. (the predictor step ) Given a starting point y k on the curve and a step-size h k , we compute a new point ˆ y k which is close to the curve (for instance,on its tangent line) and at distance h k from y k .2. (the corrector step ) We seek a point y k +1 on the curve near ˆ y k throughan iterative method.• y k •• y k +1 ˆ y k •• y k +2 ˆ y k +1 Fig. 2:
Illustration of predictor step (red dashed lines) and corrector step (bluelines).In order to start the iteration, it is necessary to compute an initial point y such that G ( y ) = 0 and then to alternate the two steps above.A first example is the strategy suggested in [11], which can be interpretedin this framework by setting G : (cid:20) xα (cid:21) (cid:55)→ H ( x, α ) . As initial step, we choose an initial value α (0) that is close to m , and computea corresponding x (0) such that H ( x (0) , α (0) ) = 0 via either the fixed-point iter-ation (5), or a more efficient method (for instance, the Newton method). Then,at each step k ,1. (predictor) we choose α ( k ) = α ( k − + h k , for a certain step-size h k (hence α (0) < α (1) < · · · < α ( k − < α ( k ) always form an increasing sequence),and use x (0) , . . . , x ( k − to compute a vector ˆ x such that H (ˆ x, α ( k ) ) ≈ .Several variants were tested [11], for instance, linear extrapolation, or aTaylor expansion around x ( k − .2. (corrector) we use ˆ x as an initial value for a fixed-point iteration suchas (5) or the Newton method to compute x ( k ) such that H ( x ( k ) , α ( k ) ) . Continuation methods for this problem In this setting, at each step we fix a value of α ( k ) arbitrarily, and thencompute the corresponding x ( k ) . Hence essentially we use the variable α toparametrize the curve S ∗ . This strategy works well in most cases, but runs intotrouble when the curve can not be written locally as the graph of a function x ( α ) ,i.e., when ∂H ( x,α ) ∂x is singular. Figure 1 is a typical example of this pathologicalbehavior.Hence, we replace the two steps with more general versions that arise fromconsidering S ∗ as a generic curve in R n +1 , without singling out α as a preferredparametrization coordinate. We now present the Predictor-Corrector-Newton method, the actual algorithmthat we use to solve the Multilinear PageRank problem.
Following [1], we use the tangent line to S ∗ as a predictor. Given a startingpoint ( x ( k ) , α ( k ) ) and a step size τ k , we look for a point (ˆ x, ˆ α ) on the tangentline to the solution curve in ( x ( k ) , α ( k ) ) at distance τ k from the starting point.By the implicit function theorem (see e.g. [15, Appendix B]), the tangentdirection is the kernel of the Jacobian matrix JH [ x ( k ) , α ( k ) ] .We compute it through a QR decomposition JH [ x ( k ) , α ( k ) ] T = [ Q q n +1 ] (cid:20) R (cid:21) . Notice that q n +1 ∈ R n spans the kernel of JH [ x ( k ) , α ( k ) ] (assuming that JH [ x ( k ) , α ( k ) ] has full rank).We also need to make sure that we move on the tangent line in the correctdirection to go towards new portions of the curve: to do this, we change thesign of q n +1 , if needed, so that (cid:28)(cid:20) x ( k − ) α ( k − (cid:21) − (cid:20) x ( k ) α ( k ) (cid:21) , q n +1 (cid:29) ≥ . Then we can set (cid:20) ˆ x ˆ α (cid:21) = (cid:20) x ( k ) α ( k ) (cid:21) + q n +1 τ k . We point out that QR factorization can be prohibitively expensive when n is large; in this case, it may be preferable to replace this strategy with linearextrapolation, that is, (ˆ x, ˆ α ) = ( x ( k ) , α ( k ) ) + τ k ( x ( k ) , α ( k ) ) − ( x ( k − , α ( k − ) (cid:107) ( x ( k ) , α ( k ) ) − ( x ( k − , α ( k − ) (cid:107) . (9) The bulk of our algorithm is the corrector step, where we use Newton’s methodfor underdetermined systems ( x j +1 , α j +1 ) = ( x j , α j ) − JH [ x j , α j ] + H ( x j , α j ) , j = 0 , , . . . (10) Continuation methods for this problem to compute a solution of H ( x, α ) = 0 close to (ˆ x, ˆ α ) .Every iteration requires the computation of JH [ x j , α j ] + , which we compute,again, with QR factorization: if JH [ x j , α j ] T = [ Q q n +1 ] (cid:20) R (cid:21) , then JH [ x j , α j ] + = [ Q q n +1 ] (cid:20) R − T (cid:21) . We choose an adaptive step size τ k with a simplified version of the approachin [1, Section 6.1].Let ˆ δ be the distance between the predicted value ˆ y and the curve S ∗ . Thisdistance is approximately equal to the square of the Newton correction at the first corrector step j = 1 , up to second-order terms, and it can also be shownto be proportional to the square of the predictor step-length, τ k , i.e., ˆ δ ≈ (cid:107) JH [ x , α ] + H ( x , α ) (cid:107) ≈ Cτ k (11)for a given constant C . We would like this distance to keep close to a prescribed“nominal distance” δ at each iteration. To this purpose, we compute C from (11)(replacing the rightmost ≈ with an equal sign), and then choose the next step-size τ k +1 so that Cτ k +1 = δ . In addition, some safeguards are taken so that thestep-size does not change by more than a factor 2 at each step; see Algorithm 1in the following for details. Since the solution curve S ∗ reaches α = 1 (Theorem 6), if the continuationmethod is successful it will eventually obtain an iteration where α ( k ) < α ≤ α ( k +1) , where α > α (0) is the desired target value for which we wish to computea solution x to (1).When this happens, we have not reached yet our goal of computing a solution x to (1) with the desired target value of the parameter α , but we only have twonearby points ( x ( k ) , α ( k ) ) and ( x ( k +1) , α ( k +1) ) on the solution curve S ∗ . As afinal step, we compute ˆ x ∈ R n such that (ˆ x, α ) is on the segment between thesetwo points (linear interpolation), and use it as the starting point for a finalround of the Newton–Raphson method (2) with fixed α . We expect this finalround to have very fast convergence, since we have identified a suitable startingpoint near to the solution curve.Full pseudocode for the algorithm is presented as Algorithm 1. Continuation methods for this problem Algorithm 1
Predictor-Corrector-Newton method
Parameters tol tolerance; α initial α for curve-following; τ initial step-size (adaptive); δ nominal distance to curve. Subroutines newton ( x , α ) Newton’s method (2) with fixed α . Input R , α , v as described above. Output x a stochastic solution to (1). x ← newton ( v, α ) y ← ( x , α ) ; t ← e n +1 ; initial direction for curve-following; k ← ; while α k < αt k +1 ← kernel of JH [ y k ] (normalized s.t. (cid:107) t k +1 (cid:107) = 1 ); if (cid:104) t k , t k +1 (cid:105) < t k +1 ← − t k +1 ; ensures t k +1 “points forward”; end ˆ y k +1 ← y k + t k +1 τ k ; predictor step; (ˆ x k +1 , ˆ α k +1 ) ← ˆ y k +1 ; j ← ; while (cid:107) H (ˆ y k +1 ) (cid:107) > tol corrector loop; j ← j + 1; d ← − JH [ˆ y k +1 ] + H (ˆ y k +1 ) ; if j = 1 f ← (cid:113) (cid:107) d (cid:107) δ ; deceleration factor as in [1, Sec. 6.1]; if f > predictor step too large; τ k ← τ k ; back to predictor step; endend ˆ y k +1 ← ˆ y k +1 + d ; endend y k +1 ← ˆ y k +1 ; ( x k +1 , α k +1 ) ← y k +1 ; f ← max ( f, ) ; τ k +1 ← τ k f ; choose next step-size; τ k +1 ← min ( τ k +1 , τ ) ; k ← k + 1 ; end η ← α − α k − α k − α k − ; linear interpolation; ˆ y ← y k − + η ( y k − y k − );(ˆ x, ˆ α ) ← ˆ y ; it must hold that ˆ α = α ; x ← newton (ˆ x, α ) . Numerical experiments We compare the Predictor-Corrector-Newton algorithm (PC-N, Algorithm 1)with two different methods: the Newton method (N) with x = (1 − α ) v andthe Perron-Newton method (PN-IMP). These algorithms are implemented asdescribed in [5] and [11] respectively, and have been chosen because they werethe better performers among the algorithms considered in those two papers.We investigate their performance on two sets of problems: the benchmarkset used in [5], and a set of
100 000 random matrices R ∈ R × . Each of thelatter ones has been created starting from the zero matrix and then, for each j , setting R ij = 1 , where i ∈ { , , , , } is chosen at random (uniformly andindependently).To better compare iteration counts for algorithms that are based on nestediterations, in the following we define one ‘iteration’ to be an inner iteration,that is, anything that requires a matrix factorization on a dense linear algebrasubproblem:Algorithm What is counted as one iterationN a Newton iteration;PN-IMP a Newton or a Perron-Newton iteration;PC-N a Newton iteration, or a predictor step, or a corrector step.In all the experiments we have set the tolerance tol = √ u , where u is themachine precision, the maximum number of iterations maxit = 10 000 and v = n e . In PC-N, we used τ = 0 . and δ = 0 . .The experiments have been run on Matlab R2020b on a computer equippedwith a 64-bit Intel core i7-4700MQ processor.Note that neither the number of iterations nor the CPU time are particu-larly indicative, alone, of the true performance to expect from the algorithms onlarge tensors: indeed, the iterations in the different methods amount to differentquantities of work, and the CPU time may scale differently for each algorithmwhen one switches to linear algebra primitives better suited to large-scale prob-lems. Rather, we focus here on the reliability of the algorithms, i.e., the numberof problems that they can solve successfully.In the first experiment, we ran each algorithm on every tensor of the bench-mark set, checking the number of iterations and the CPU times in secondsrequired for the solution of the problem. Performance profiles (see e.g. [6, Sec-tion 22.4] for an introduction to this type of plot) are reported in Figures 3–5.The performance profile shows that the Newton method is always the fastestmethod, when it works, but it fails on various problems (6 out of 29 for α = 0 . ).The new method is about a factor 5 slower, but has much higher reliability,failing on none of the problems. PN-IMP fails on one of the problems, andit is slower in terms of CPU time: this is to be expected, since eigenvaluecomputations are generally slower than QR factorizations and linear systemsolutions.In the second experiment, we counted the number of failures of each of thethree algorithms for different values of α . The results are reported in Table 1. Numerical experiments Fig. 3:
Performance profiles for the 29 benchmark tensors and α = 0 . . Fig. 4:
Performance profiles for the 29 benchmark tensors and α = 0 . . Fig. 5:
Performance profiles for the 29 benchmark tensors and α = 0 . . Conclusions α N PN-IMP PC-N0.90 102 22 00.95 317 47 00.99 693 136 0
Tab. 1:
Number of failures recorded on 100k random sparse tensors of size . The experimental results confirm that the suggested method, based on the com-bination of predictor-corrector continuation methods and Newton’s method forunderdetermined systems, is an effective and reliable method to solve multi-linear Pagerank problems, even in cases that are problematic for most othermethods. The theoretical analysis performed confirms that the solution curvealways reaches a valid solution, at least in the case where the solution curvehas no singular points. It remains to consider how this method scales to largerproblems.
Acknowledgments
The authors are grateful to Beatrice Meini for several use-ful discussions on this topic.
References [1] E. L. Allgower and K. Georg.
Introduction to Numerical ContinuationMethods . Society for Industrial and Applied Mathematics, Philadelphia,PA, USA, 2003.[2] A. R. Benson, D. F. Gleich, and L.-H. Lim. The spacey random walk: astochastic process for higher-order data.
SIAM Rev. , 59(2):321–345, 2017.[3] S. Cipolla, M. Redivo-Zaglia, and F. Tudisco. Extrapolation methods forfixed-point multilinear PageRank computations.
Numer. Linear AlgebraAppl. , 27(2):e2280, 22, 2020.[4] D. Fasino and F. Tudisco. Higher-order ergodicity coefficients for stochastictensors.
Numerical Analysis , 2019.[5] D. F. Gleich, L.-H. Lim, and Y. Yu. Multilinear PageRank.
SIAM J.Matrix Anal. Appl. , 36(4):1507–1541, 2015.[6] D. J. Higham and N. J. Higham.
MATLAB guide . Society for Industrialand Applied Mathematics (SIAM), Philadelphia, PA, 2017. Third edition.[7] L. Hogben, editor.
Handbook of linear algebra . Discrete Mathematics andits Applications (Boca Raton). CRC Press, Boca Raton, FL, second edition,2014.[8] R. B. Kellogg, T. Y. Li, and J. Yorke. A constructive proof of the Brouwerfixed-point theorem and computational results.
SIAM J. Numer. Anal. ,13(4):473–483, 1976.
Conclusions [9] W. Li, D. Liu, M. K. Ng, and S.-W. Vong. The uniqueness of multilinearPageRank vectors. Numer. Linear Algebra Appl. , 24(6):e2107, 12, 2017.[10] L.-H. Lim. Singular values and eigenvalues of tensors: a variational ap-proach. In , pages 129–132. IEEE, 2005.[11] B. Meini and F. Poloni. Perron-based algorithms for the multilinear PageR-ank.
Numer. Linear Algebra Appl. , 25(6):e2177, 15, 2018.[12] M. Ng, L. Qi, and G. Zhou. Finding the largest eigenvalue of a nonnegativetensor.
SIAM J. Matrix Anal. Appl. , 31(3):1090–1099, 2009.[13] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citationranking: Bringing order to the web. In
Proceedings of the 7th InternationalWorld Wide Web Conference , pages 161–172, Brisbane, Australia, 1998.[14] L. Qi. Eigenvalues and invariants of tensors.
J. Math. Anal. Appl. ,325(2):1363–1377, 2007.[15] L. W. Tu, editor.