Scalable parallel algorithm for solving non-stationary systems of linear inequalities
aa r X i v : . [ c s . M S ] A ug Scalable Parallel Algorithm for Solving Non-stationary Systemsof Linear Inequalities
L. B. Sokolinsky
1, * and I. M. Sokolinskaya
1, ** (Submitted by E. E. Tyrtyshnikov) South Ural State University (National Research University), Lenin prospekt, 76, Chelyabinsk, 454080 Russia
Received March 10, 2020; revised March 23, 2020; accepted March 31, 2020
Abstract—In this paper, a scalable iterative projection-type algorithm for solving non-stationarysystems of linear inequalities is considered. A non-stationary system is understood as a large-scale system of inequalities in which coefficients and constant terms can change during thecalculation process. The proposed parallel algorithm uses the concept of pseudo-projectionwhich generalizes the notion of orthogonal projection. The parallel pseudo-projection algorithmis implemented using the parallel BSF-skeleton. An analytical estimation of the algorithmscalability boundary is obtained on the base of the BSF cost metric. The large-scalecomputational experiments were performed on a cluster computing system. The obtained resultsconfirm the efficiency of the proposed approach.2010 Mathematical Subject Classification:
Keywords and phrases: non-stationary system of linear inequalities, feasibility problem, iterativemethod, parallel algorithm, approximate solution, cluster computing system
1. INTRODUCTIONThe problem of finding a feasible solution to a linear inequality system also known as a feasibilityproblem is often encountered in the practice of mathematical modeling. As examples, we canmention linear programming [1, 2], image reconstruction from projections [3], tomography imagereconstruction [4] and intensity modulated radiation therapy [5]. In many cases, the linear inequalitysystems arising in such context involve up to tens of millions of inequalities and up to hundreds ofmillions of variables [2]. Moreover, in mathematical economic models, systems of linear inequalitiesare often non-stationary. It means that the coefficients of the system and constant terms are changedduring the process of solving the problem, and the period of changing the source data can be withinhundredths of a second.At present time, there are a lot of methods for solving systems of linear inequalities. Amongthese methods, we can distinguish a class of “self-correcting” iterative methods that can beparallelized efficiently. Pioneering works here are papers [6, 7], which propose the Agmon–Motzkin–Schoenberg relaxation method for solving systems of linear inequalities. This method uses theorthogonal projection onto a hyperplane in Euclidean space. Censor and Elfving in [3, 8] proposeda modification of the Cimmino method [9–11] for solving systems of linear inequalities in Euclideanspace R n . The similar method of pseudo-projections based on Fejer approximations was proposedby the authors in [12]. In the article [2], the pseudo-projection method was used to solve theproblem of finding a feasible solution to a non-stationary linear inequality system. The convergencetheorem for this method was proven by the authors for the case when changing the feasible setis a translation. We have constructed a parallel implementation of the pseudo-projection methodand executed large-scale computational experiments on a cluster computing system by varying thedisplacement rate of the polytope M bounding the feasible region. Performed evaluation showed * E-mail: [email protected] ** E-mail: [email protected]
L. B. SOKOLINSKY, I. M. SOKOLINSKAYA that the parallel pseudo-projection algorithm converges only with very low rate of displacement ofthe polytope M .The aims of this article are the following: analyzing the low efficiency of the parallel pseudo-projection algorithm for the non-stationary case, modifying the algorithm to solve this issue,evaluating the modified algorithm and conducting the large-scale computational experiments ona cluster computing system to examine the efficiency of proposed solution.The paper is organized as follows. In Section 2, we provide a formal definition of a non-stationary system of linear inequalities and describe a modified pseudo-projection algorithm ModAPcalculating a feasible solution for such systems under condition of source data dynamic changes.Section 3 describes the ModAPL algorithm, which is a representation of the ModAP algorithm in theform of operations on lists using the higher-order functions M ap and
Reduce , and presents a parallelimplementation of the ModAPL algorithm. Section 4 is devoted to an analytical evaluation of theModAPL parallel algorithm scalability by using the cost metric of the BSF parallel computationmodel. Section 5 provides an information about the implementation of the ModAPL parallelalgorithm, as well as describes the results of large-scale computational experiments on a clustercomputing system that confirm the efficiency of the proposed approach. Section 6 summarizesthe obtained results and concludes that the scalability of the algorithm depends on the number ofdynamically changing parameters of the source linear inequality system.2. NON-STATIONARY PROBLEM AND PSEUDO-PROJECTIONALGORITHMLet the following feasible system of linear inequalities be given in R n : A ( t ) x ≤ b ( t ) , (1)where the matrix A ( t ) has m rows. The non-stationarity of the system (1) is understood in thesense that the entries of the matrix A ( t ) and the elements of column b ( t ) depend on time t ∈ R ≥ .Let M ( t ) be a polytope bounding the feasible region of the system (1) at instant of time t . Such apolytope is always a closed convex set. We will also assume that the polytope M ( t ) is a boundedset. Let us define the distance from the point x ∈ R n to the polytope M ( t ) as follows: d (cid:16) x, M ( t ) (cid:17) = inf y ∈ M ( t ) k x − y k , where k·k signifies the Euclidean norm. Let us denote the i -th row of the matrix A ( t ) as a ( t ) i . Weassume from now on that a ( t ) i is not equal to the zero vector for all i = 1 , . . . , m . Let P ( t ) i be ahalf-space representing a set of feasible points for the i -th inequality of the system (1): P ( t ) i = n x | x ∈ R n , D a ( t ) i , x E ≤ b ( t ) i o . Then, M ( t ) = m T i =1 P ( t ) i . Each equation D a ( t ) i , x E = b ( t ) i defines the corresponding hyperplane H ( t ) i : H ( t ) i = n x ∈ R n | D a ( t ) i , x E = b ( t ) i o . We define the reflection vector ρ H ( t ) i ( x ) of the point x with respect to the hyperplane H i as follows: ρ H ( t ) i ( x ) = D a ( t ) i , x E − b ( t ) i (cid:13)(cid:13)(cid:13) a ( t ) i (cid:13)(cid:13)(cid:13) a ( t ) i . Then, the orthogonal projection π H ( t ) i ( x ) of the point x onto hyperplane H ( t ) i is calculated by theequation π H ( t ) i ( x ) = x − ρ H ( t ) i ( x ) . LOBACHEVSKII JOURNAL OF MATHEMATICS
LGORITHM FOR SOLVING LINEAR INEQUALITIES 3
Algorithm 1. (cid:883)(cid:483) (cid:29) (cid:19) k (cid:32) (cid:884)(cid:483)(cid:3) (cid:19) (cid:29) x (cid:32) (cid:19) (cid:885)(cid:483)(cid:3) (cid:11) (cid:12) (cid:11) (cid:12)(cid:20) (cid:29) tk k k x x x (cid:77) (cid:14) (cid:32) (cid:16) (cid:886)(cid:483)(cid:3) If (cid:11) (cid:12)(cid:20) tk x M (cid:72)(cid:14) (cid:143) go to 7 (cid:887)(cid:483)(cid:3) (cid:29) (cid:20) k k (cid:32) (cid:14) (cid:888)(cid:483)(cid:3) go to 3 (cid:889)(cid:483)(cid:3) (cid:20) (cid:29) k x x (cid:14) (cid:32) (cid:4) (cid:890)(cid:483)(cid:3) Stop (cid:3)
Figure 1. Pseudo-projection algorithm for non-stationarysystems of linear inequalities.
Define the vector ρ + H ( t ) i ( x ) as a positive slice of the reflection vector ρ + H ( t ) i ( x ) = max nD a ( t ) i , x E − b ( t ) i , o(cid:13)(cid:13)(cid:13) a ( t ) i (cid:13)(cid:13)(cid:13) a ( t ) i . (2)The vector ρ + H ( t ) i ( x ) will be a non-zero vector if and only if the point x does not satisfy the i -thinequality of the system (1). Let us designate ϕ ( t ) ( x ) = 1 h m X i =1 ρ + H ( t ) i ( x ) , (3)where h is the number of non-zero terms in the sum m P i =1 ρ + H ( t ) i ( x ) .Define the half-space P i ( i = 1 , . . . , m ) as follows: P ( t ) i = n x ∈ R n | D a ( t ) i , x E ≤ b ( t ) i o . We will saythat the point ˜ x belongs to the half-space P ( t ) i with precision ε > and denote it as ˜ x ∈ ε P ( t ) i , if thefollowing condition holds ∀ i ∈ { , . . . , m } ˜ x ∈ P ( t ) i ∨ (cid:12)(cid:12)(cid:12)D a ( t ) i , ˜ x E − b ( t ) i (cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) a ( t ) i (cid:13)(cid:13)(cid:13) < ε . (4)This means that, for any i = 1 , . . . , m , either the point ˜ x belongs to the half-space P ( t ) i , or thedistance between the point ˜ x and this half-space is less than ε . We assume that the point ˜ x belongsto the polytope M ( t ) with precision ε , and we will denote it as ˜ x ∈ ε M ( t ) , if the following conditionholds: ∀ i ∈ { , . . . , m } (cid:16) ˜ x ∈ ε P ( t ) i (cid:17) . In other words, for any i = 1 , . . . , m , either the point ˜ x belongs to the polytope M ( t ) , or the distancebetween the point ˜ x and the polytope M ( t ) is less than ε .The pseudo-projection algorithm for the non-stationary case is shown in Fig. 1. Here ε is a smallpositive value that is a parameter of the algorithm. This algorithm calculates a point ˜ x that isan approximate solution of the non-stationary system of linear inequalities (1) in the sense thatthe point ˜ x belongs to the polytope M ( t ) with precision ε (Step 4). If the polytope M does notchange over time, the Algorithm 1 finishes in a finite number of steps for any ε > . This followsfrom the fact that the mapping α ( t ) = x k − ϕ ( t ) ( x k ) used in the Step 3, when t is fixed, is an single-valued continuous M ( t ) -fejerian mapping [13], and consequently the iterative process implemented LOBACHEVSKII JOURNAL OF MATHEMATICS
L. B. SOKOLINSKY, I. M. SOKOLINSKAYA (cid:3422)(cid:2206) (cid:2778) (cid:3410) (cid:3)(cid:3)(cid:2783)(cid:2206) (cid:2779) (cid:3410) (cid:3)(cid:3)(cid:2783)(cid:2206) (cid:2778) (cid:3409) (cid:2778)(cid:2781)(cid:2206) (cid:2779) (cid:3409) (cid:2778)(cid:2781) (cid:1876) (cid:2868) (cid:1834) (cid:2870) (cid:1834) (cid:2871) (cid:1834) (cid:2872) (cid:1834) (cid:2869) (cid:1876) (cid:2869) (cid:1876) (cid:2870) (cid:2013) -vicinity (cid:2169)
Figure 2. The case, when the rate of convergence slowsdown. in the Algorithm 1 converges to a point belonging to the polytope M ( t ) [14]. Another proof ofthe convergence of the Algorithm 1 can be found in [3]. However, both proofs are valid only forstationary problems. The non-stationary case was considered by us in [2] under the assumption thatchanging the initial data is a translation of the polytope M ( t ) . For this case, we have proved thetheorem of a sufficient condition of converging the iterative process implemented by the Algorithm 1.We have performed a parallel implementation of the Algorithm 1 using the BSF-skeleton [15]and have executed large-scale computational experiments on the “Tornado SUSU” [16] computingcluster. We simulated the non-stationarity of the original inequality system by displacing thepolytope M ( t ) along a straight line at a given rate. The obtained results show that Algorithm 1demonstrates good scalability of up to 200 processor nodes when solving a non-stationary systeminvolving 108002 inequalities and 54000 variables. However, in some cases, the maximum rate ofthe polytope displacement, at which the algorithm converged, did not exceed two units per second.It is insufficient for solving practical problems. Such result can be explained by the simple exampleshown in Fig. 2. A peculiarity of the Fejer mapping α ( t ) = x k − ϕ ( t ) ( x k ) used in Step 3 is thateach new approximation is closer to the polytope than the previous one. But, in the case shownin Fig. 2, the Fejer process never reaches the polygon M . The most we can achieve in a finitenumber of iterations is to get a point inside of the polygon vertex vicinity. Assume, that at time t , the Algorithm 1 starts to perform the iterative process (Step 3) giving an approximate solution ˜ x with distance to the polytope M ( t ) less than ε (we used ε = 10 − in the experiments). Letthe calculation be stopped at time t . During the elapsed time, the polytope is displaced to theposition M ( t ) , the distance from which to the point ˜ x is more than ε . Therefore, we have to startthe calculation again. In such a way, we will never get a solution with given precision.In order to repair the problem, we replaced the mapping ϕ ( t ) used in Step 3 of Algorithm 1 withthe following mapping ψ ( t ) : ψ ( t ) ( x ) = λ ϕ ( t ) ( x ) (cid:13)(cid:13) ϕ ( t ) ( x ) (cid:13)(cid:13) , (5)where λ is a positive constant being a parameter of the algorithm. A peculiarity of the mapping ψ ( t ) is that its result is always a vector of fixed length λ . The modified algorithm, named ModAP,is presented in Fig. 3.3. PARALLEL VERSION OF MODAP ALGORITHMTo construct a parallel version of the algorithm ModAP, we use the BSF (Bulk SynchronousFarm) model of parallel computations [17]. In accordance to the technique proposed in [18],we represent the algorithm in the form of operations on lists using the higher-order functions LOBACHEVSKII JOURNAL OF MATHEMATICS
LGORITHM FOR SOLVING LINEAR INEQUALITIES 5
Algorithm 2. (cid:883)(cid:483) : 0 k = (cid:884)(cid:483)(cid:3) : x = (cid:885)(cid:483)(cid:3) ( ) ( )1 : tk k k x x x y + = - (cid:886)(cid:483)(cid:3) If ( )1 tk x M e+ ˛ go to 7 (cid:887)(cid:483)(cid:3) : 1 k k = + (cid:888)(cid:483)(cid:3) go to 3 (cid:889)(cid:483)(cid:3) : k x x + = % (cid:890)(cid:483)(cid:3) Stop ified pseudo-projection algorithm (ModAP).
Figure 3. Modified pseudo-projection algorithm (ModAP).
M ap and
Reduce defined in the Bird-Meertens formalism [19]. Let us define the list A =[( a , b ) , . . . , ( a m , b m )] , where a i is the i -th row of the matrix A , and b i is the i -th element ofthe column b ( i = 1 , . . . , m ) . Here, we omit the upper index of time ( t ) assuming the moment oftime to be fixed.Let us define the parameterized function F x : R n × R → R n × { , } as follows: F x ( a i , b i ) = ρ + H i ( x ) . (6)This function maps the pair ( a i , b i ) to the vector y i = ρ + H i ( x ) , which is the positive slice of thereflection vector of the point x relative to the hyperplane H i = { x ∈ R n | h a i , x i = b i } . The higher-order function M ap ( F x , A ) applies the function F x to each element of list A converting it to thefollowing list B = [ y , . . . , y m ] = [ F x ( a , b ) , . . . , F x ( a m , b m )] , where y i ∈ R n ( i = 1 , . . . , m ) . Let the symbol + denote vector addition operation. Then, the higher-order function Reduce (+ , B ) calculates the vector y, which is the sum of the vectors of the list B : y = y + . . . + y m . Obviously, according to the equation (3), we have ϕ ( x ) = y/h. Taking (5) intoaccount, we obtain from this ψ ( x ) = λ y k y k . In such a way, we obtain the sequential version of the ModAP algorithm on lists presented in Fig. 4.In Step 1, the input of the list A containing initial values of the matrix A and the column b ofthe inequality system (1) are executed. In Step 2, the zero vector is assigned to the x as an initialapproximation. In Step 3, the M ap function calculates the list B of positive slices of the reflectionvectors of the point x relative to the hyperplanes bounding the polytope M A , which is the feasibleregion of the inequality system (1). In Step 4, the vector y is calculated as the sum of all positiveslices of the reflection vectors. Step 5 calculates the next approximation x . In Step 6, the originaldata of the inequality system (1) is updated. Step 7 checks if the obtained approximation belongs tothe polytope M A with precision ε . If so, then we go to Step 9, where x is output as the approximatesolution, after which the iterative process is stopped. Otherwise, we go from Step 8 to Step 3, andthe iterative process continues.The parallel version of the modified pseudo-projection algorithm on lists (ModAPL) is presentedin Fig. 5. We used the master-slave paradigm [20]. It is assumed that the computing systemincludes one master and K slaves ( K ≥ ). In Step 1, the master and all slaves load the sourcedata presented as the list A . Step 3 starts the iterative process: the master sends the currentapproximation x to all slaves. In Step 4, the slaves independently calculate their part of the list B .In Step 5, the slaves independently sum up the elements of their sublists of the list B . In Step 6,the calculated vectors y (1) , . . . , y ( K ) are sent to the master. In Step 7, the master calculates theresulting vector y . Step 8 calculates the next approximation x . In Step 9, the original data of theinequality system (1) is updated by the master and all slaves. Step 10 checks the stopping criterionand assigns the appropriate value to the variable exit. In Step 11, the master sends the value of the LOBACHEVSKII JOURNAL OF MATHEMATICS
L. B. SOKOLINSKY, I. M. SOKOLINSKAYA
Algorithm 3. (cid:883)(cid:483) input A (cid:884)(cid:483)(cid:3) (cid:1876) (cid:1556) (cid:2777) (cid:885)(cid:483)(cid:3) B (cid:1556) (cid:1839)(cid:1853)(cid:1868)(cid:4666)(cid:1832) (cid:3051) (cid:481) A (cid:4667) (cid:886)(cid:483)(cid:3) (cid:1877) (cid:1556) (cid:1844)(cid:1857)(cid:1856)(cid:1873)(cid:1855)(cid:1857)(cid:4666)(cid:3397)(cid:481) B (cid:4667) (cid:887)(cid:483)(cid:3) (cid:1876)(cid:483) (cid:3404) (cid:1876) (cid:3398) (cid:2019) (cid:3052)(cid:1313)(cid:3052)(cid:1313) (cid:888)(cid:483)(cid:3) update A (cid:889)(cid:483)(cid:3) if x M e ˛ A goto (cid:890)(cid:483)(cid:3) goto (cid:885)(cid:3)(cid:891)(cid:483)(cid:3) output (cid:1876) (cid:883)(cid:882)(cid:483)(cid:3) stop AP algorithm on lists.
Figure 4. ModAP algorithm on lists.
Algorithm 4.
Master j-th Slave (j=1,…,K) (cid:883)(cid:483) input A (cid:883)(cid:483) input A (cid:884)(cid:483)(cid:3) (cid:1876) (cid:1556) (cid:2777) (cid:884)(cid:483) (cid:885)(cid:483)(cid:3) (cid:1845)(cid:1857)(cid:1866)(cid:1856)(cid:1846)(cid:1867)(cid:1827)(cid:1864)(cid:1864)(cid:1849)(cid:1867)(cid:1870)(cid:1863)(cid:1857)(cid:1870)(cid:1871)(cid:4666)(cid:1876)(cid:4667) (cid:885)(cid:483) (cid:1844)(cid:1857)(cid:1855)(cid:1874)(cid:1832)(cid:1870)(cid:1867)(cid:1865)(cid:1839)(cid:1853)(cid:1871)(cid:1872)(cid:1857)(cid:1870)(cid:4666)(cid:1876)(cid:4667) (cid:886)(cid:483)(cid:3) (cid:886)(cid:483) B (cid:4670)(cid:3037)(cid:4671) (cid:1556) (cid:1839)(cid:1853)(cid:1868)(cid:3435)(cid:1832) (cid:3051) (cid:481) A (cid:4670)(cid:3037)(cid:4671) (cid:3439) (cid:887)(cid:483)(cid:3) (cid:887)(cid:483) (cid:1877) (cid:4666)(cid:3037)(cid:4667) (cid:1556) (cid:1844)(cid:1857)(cid:1856)(cid:1873)(cid:1855)(cid:1857)(cid:3435)(cid:3397)(cid:481) B (cid:4670)(cid:3037)(cid:4671) (cid:3439) (cid:888)(cid:483)(cid:3) (cid:1844)(cid:1857)(cid:1855)(cid:1874)(cid:1832)(cid:1870)(cid:1867)(cid:1865)(cid:1827)(cid:1864)(cid:1864)(cid:1849)(cid:1867)(cid:1870)(cid:1863)(cid:1857)(cid:1870)(cid:1871)(cid:3435)(cid:1862)(cid:481) (cid:1877) (cid:4666)(cid:3037)(cid:4667) (cid:3439) (cid:888)(cid:483) (cid:1845)(cid:1857)(cid:1866)(cid:1856)(cid:1846)(cid:1867)(cid:1839)(cid:1853)(cid:1871)(cid:1872)(cid:1857)(cid:1870)(cid:3435)(cid:1877) (cid:4666)(cid:3037)(cid:4667) (cid:3439) (cid:889)(cid:483)(cid:3) (cid:1877) (cid:1556) (cid:1844)(cid:1857)(cid:1856)(cid:1873)(cid:1855)(cid:1857)(cid:3435)(cid:3397)(cid:481) (cid:4670)(cid:1877) (cid:4666)(cid:2869)(cid:4667) (cid:481) (cid:485) (cid:481) (cid:1877) (cid:4666)(cid:3012)(cid:4667) (cid:4671)(cid:3439) (cid:889)(cid:483) (cid:890)(cid:483)(cid:3) (cid:1876)(cid:483) (cid:3404) (cid:1876) (cid:3398) (cid:2019) (cid:3052)(cid:1313)(cid:3052)(cid:1313) (cid:890)(cid:483) (cid:891)(cid:483)(cid:3) update A (cid:891)(cid:483) (cid:883)(cid:882)(cid:483)(cid:3) if x M e ˛ A then (cid:1857)(cid:1876)(cid:1861)(cid:1872) (cid:1527)(cid:3404) (cid:3)(cid:1872)(cid:1870)(cid:1873)(cid:1857) else (cid:1857)(cid:1876)(cid:1861)(cid:1872) (cid:1527)(cid:3404) (cid:3)(cid:1858)(cid:1853)(cid:1864)(cid:1871)(cid:1857) (cid:883)(cid:882)(cid:483) (cid:883)(cid:883)(cid:483)(cid:3) (cid:1845)(cid:1857)(cid:1866)(cid:1856)(cid:1846)(cid:1867)(cid:1827)(cid:1864)(cid:1864)(cid:1849)(cid:1867)(cid:1870)(cid:1863)(cid:1857)(cid:1870)(cid:1871)(cid:4666)(cid:1857)(cid:1876)(cid:1861)(cid:1872)(cid:4667) (cid:883)(cid:883)(cid:483)(cid:3) (cid:1844)(cid:1857)(cid:1855)(cid:1874)(cid:1832)(cid:1870)(cid:1867)(cid:1865)(cid:1839)(cid:1853)(cid:1871)(cid:1872)(cid:1857)(cid:1870)(cid:4666)(cid:1857)(cid:1876)(cid:1861)(cid:1872)(cid:4667) (cid:883)(cid:884)(cid:483)(cid:3) if (cid:1857)(cid:1876)(cid:1861)(cid:1872) (cid:3404) (cid:1872)(cid:1870)(cid:1873)(cid:1857) goto (cid:883)(cid:884)(cid:483)(cid:3) if (cid:1857)(cid:1876)(cid:1861)(cid:1872) (cid:3404) (cid:1872)(cid:1870)(cid:1873)(cid:1857) goto (cid:883)(cid:885)(cid:483)(cid:3) (cid:1845)(cid:1857)(cid:1866)(cid:1856)(cid:1846)(cid:1867)(cid:1827)(cid:1864)(cid:1864)(cid:1849)(cid:1867)(cid:1870)(cid:1863)(cid:1857)(cid:1870)(cid:1871)(cid:4666) A (cid:4667) (cid:883)(cid:885)(cid:483)(cid:3) (cid:1844)(cid:1857)(cid:1855)(cid:1874)(cid:1832)(cid:1870)(cid:1867)(cid:1865)(cid:1839)(cid:1853)(cid:1871)(cid:1872)(cid:1857)(cid:1870)(cid:4666) A (cid:4667) (cid:883)(cid:886)(cid:483)(cid:3) goto (cid:883)(cid:886)(cid:483) goto (cid:883)(cid:887)(cid:483)(cid:3) output (cid:1876) (cid:883)(cid:887)(cid:483) (cid:883)(cid:888)(cid:483)(cid:3) stop (cid:883)(cid:888)(cid:483) stop APL parallel algorithm on lists.
Figure 5. ModAPL parallel algorithm on lists. variable exit to all slaves. Depending on this value, we either go to the next iteration or terminatethe iterative process (Steps 12–16).4. ANALYTICAL STUDY OF PARALLEL ALGORITHMTo obtain an analytical estimation of the scalability of the parallel version of the ModAPLalgorithm (Fig. 5) we use the cost metric of the BSF model [18]. It includes the following parameters: K : number of slave nodes; t s : time spent by the master node to send current approximation to one slave node(excluding latency); t Map : time spent by a single slave node to process the entire list A ; t p : time spent by the master node to process results received from the slave nodes and tocheck the stopping criterion; t r : time spent by the master node to receive the result from one slave node (excludinglatency); t a : time spent by a node (master or slave) to process an addition of two vectors; l : length of the list A (the same as the length of the list B ); L : latency (time of transferring one-byte message node-to-node). LOBACHEVSKII JOURNAL OF MATHEMATICS
LGORITHM FOR SOLVING LINEAR INEQUALITIES 7
For the ModAPL algorithm we have l = m. (7)Within one iteration, we introduce the following notation: c s : number of float values sending from the master node to one slave node; c Map : number of arithmetic operations to execute the higher-order function
M ap for theentire list A ; c a : number of arithmetic operations to add two vectors in n -dimensional space; c r : number of float values sending from one slave node to the master node; c p : number of arithmetic operations performed by the master in Steps 8 and 10 ofAlgorithm 4; c u : number of float values sending from the master node to one slave node in Step 13 ofAlgorithm 4.Calculate the specified values. At the beginning of each iteration in Step 3, the master sendsthe vector x containing n float values to every slave. Therefore c s = n. In the
M ap step (Step 3of the Algorithm 3), the function F x defined by the equation (6) is applied to all elements of thelist A having the length m . In view of (2), this implies c Map = (5 n + 1) m. To add two vectors ofdimension n , one must execute n arithmetic operations. Therefore, c a = n. In Step 6 of Algorithm 4, j -th slave node sends the vector y ( j ) of dimension n to the master node. Hence it follows that c r = n. Assume that the square root is calculated by using the first four terms of the Taylor series. This is9 arithmetic operations. Then, the execution of Step 8 of Algorithm 4 requires n + 8 arithmeticoperations. According to the equation (4), the execution of Step 11 requires (6 n + 11) m arithmeticoperations. Hence it follows that c p = (6 n + 11) m + 5 n + 8 . If only one value changes in the source data during one iteration, then one real number must besent in Step 13. In this case, we have c u = 1 . If all values changes in the source data during oneiteration, then ( n + 1) m real number must be sent in Step 13. In this case, we have c u = ( n + 1) m. Let τ op denote the time that a processor node spends to execute one arithmetic operation (orone comparison operations), and τ tr denote the time that a processor node spends to send one floatvalue to another processor node excluding latency. Then for c u = 1 we get the following values forthe cost parameters of the ModAPL parallel algorithm: t s = ( c s + c u ) τ tr = ( n + 1) τ tr ; (8) t Map = c Map τ op = (5 n + 1) mτ op ; (9) t r = c r τ tr = nτ tr ; (10) t a = c a τ op = nτ op ; (11) t p = c p τ op = ((6 n + 11) m + 5 n + 8) τ op . (12)Using the scalability boundary equation from [18], and taking into account (7), we obtain from (8)–(12) the following estimation: K MAX = s t Map + lt a L + t s + t r + t a = s (5 n + 1) mτ op + nmτ op L + ( n + 1) τ tr + nτ tr + nτ op . (13)Assuming m = kn for some k ∈ N and n ≫ , it is easy to get the following estimation from (13): K MAX = O (cid:0) √ n (cid:1) , (14)where K MAX is the number of processor nodes, for which the maximal speedup is achieved.
LOBACHEVSKII JOURNAL OF MATHEMATICS
L. B. SOKOLINSKY, I. M. SOKOLINSKAYA
200 200 200 200( 1) 100 1000 0 0 nnnn x x xx x x nx x xx x x ---- £(cid:236)(cid:239) £(cid:239)(cid:239)(cid:239) £(cid:239)(cid:239) + + £ - +(cid:239)(cid:237) + + ‡(cid:239)(cid:239) ‡(cid:239) ‡(cid:239)(cid:239)(cid:239) ‡(cid:239)(cid:238)
O L LLLO L L . Test problem.
Figure 6. Test problem.
When c u = ( n + 1) m we have t s = ( c s + c u ) τ tr = ( n + ( n + 1) m ) τ tr . We obtain from this thefollowing estimation: K MAX = s t Map + lt a L + t s + t r + t a = s (5 n + 1) mτ op + nmτ op L + ( n + ( n + 1) m ) τ tr + nτ tr + nτ op . (15)Assuming m = kn for some k ∈ N and n ≫ , it is easy to get the following estimation from (15): K MAX = O (1) . (16)Thus, we can conclude that the ModAPL parallel algorithm will demonstrate a limited scalabilitywhen a small part of the source data is dynamically changed. In this case, according to theequation (14), the scalability bound increases proportionally to the square root of n , where n isthe space dimension. If a large part of the source data is dynamically changed, then, according tothe equation (16), the ModAPL parallel algorithm becomes inefficient due to the lack of speedup.5. COMPUTATIONAL EXPERIMENTSTo verify the efficiency of the ModAPL parallel algorithm, we implemented this algorithm inC++ language using a parallel BSF-skeleton [15] based on MPI parallel programming library. Allsource codes are freely available at https://github.com/leonid-sokolinsky/NSLP-Quest . As atest problem, we used a scalable inequality system of dimension n from [21]. This system is presentedin Fig. 6. The number of inequalities in this system is m = 2 n + 2 . The non-stationarity of theproblem was simulated by a translation of the polytope M ( t ) . The computational experiments wereconducted on the “Tornado SUSU” computing cluster [16], whose specifications are shown in Table 1.The results of the experiments are presented in Fig. 7. The diagrams show the dependence of therunning time of the ModAPL parallel algorithm on the displacement rate of the polytope M ( t ) . Thevalues of coordinate-wise displacement indicate how many units per second will be added to eachcoordinate of the polytope M before next iteration. In the first experiment, we used the systemin Fig. 6 with dimension n = 32000 and number of inequalities m = 64002 . Calculations wereconducted on the hardware configurations with 50, 75 and 100 processor nodes. The results of thisexperiment are presented in Fig. 7 (a). In the diagram, K denotes the number of processor nodesin the hardware configuration. For K = 50 , the highest rate of the coordinate-wise shift at whichthe iterative process “catch up with the polytope” was approximately 10 units per second. Addingup the displacement vectors for all coordinates, we get the total rate vector with a length equal to √ · ≈ K = 75 , the highest total displacement rate at which thealgorithm converged increases to 2683 units per second. And for K = 100 , the convergence of thealgorithm was observed at the total displacement rate of up to 3220 units per second. For K > ,the parallel efficiency for the problem of dimension n = 32000 began to fall off. LOBACHEVSKII JOURNAL OF MATHEMATICS
LGORITHM FOR SOLVING LINEAR INEQUALITIES 9
Table 1. Specifications of “Tornado SUSU” computing cluster.
Number of processor nodes 480Processor Intel Xeon X5680 (6 cores 3.33 GHz)Processors per node 2Memory per node 24 GB DDR3Interconnect InfiniBand QDR (40 Gbit/s)Operating system Linux CentOS (a) eriment with = 32000 and = 64002 umber of processor nodes). (b) eriment with = 54000 and = 108002 umber of processor nodes). T i m e ( s e c . ) Coordinate-wise displacement (units/sec.)
K=50K=75K=100 050100150200250 0 1 2 3 4 5 T i m e ( s e c . ) Coordinate-wise displacement (units/sec.)
K=75K=100K=150
Figure 7. Experiment results ( K is the number of processor nodes):(a) n = 32000 and m = 64002 ; (b) n = 54000 and m = 108002 . In the second experiment, we used the system in Fig. 6 with dimension n = 54000 and numberof inequalities m = 108002 . Calculations were conducted on the hardware configurations with 75,100 and 150 processor nodes. The results of this experiment are presented in Fig. 7 (b). In thiscase, the maximum displacement rates of the polytope at which the algorithm converged were 697,930, and 1162 units per second, respectively. For K > , the parallel efficiency for the problemof dimension n = 54000 also began to fall off.Thus, the conducted experiments show that the proposed modified algorithm of pseudo-projections allows the effective parallelization on cluster computing systems and is able to findfeasible points for non-stationary systems of linear inequalities that dynamically changed in a certainway with high rate. 6. CONCLUSIONIn this article, we presented the modified parallel iterative pseudo-projection algorithm thatcan find feasible points for non-stationary systems of linear inequalities of large dimensions oncluster computing systems. This algorithm was presented as the ModAPL algorithm on listsusing the higher-order functions M ap and
Reduce . For the ModAPL algorithm, we obtainedan estimation of the scalability bound of its parallel version using the cost metric of the BSFparallel computing model [18]. If a small part of the source data is dynamically changed duringone iteration then the scalability bound is equal to O ( √ n ) , where n is the problem dimension. If alarge part of the source data is dynamically changed during one iteration then the scalability boundis equal to O (1) , which means there is no acceleration at all. We implemented ModAPL parallelalgorithm in C++ language using a parallel BSF-skeleton [15]. All source codes are freely availableat https://github.com/leonid-sokolinsky/NSLP-Quest . By using this implementation, weconducted the large-scale computational experiments on a computing cluster. We simulated theproblem non-stationarity by transferring the polytope bounding the feasible region by a certainnumber of units during each iteration. The conducted experiments show that the ModAPL parallel LOBACHEVSKII JOURNAL OF MATHEMATICS algorithm is able to find feasible points for non-stationary systems of linear inequalities with 54 000variables and 108 002 inequalities on a cluster computing system with 200 processor nodes.Acknowledgments. This research was partially supported by the Russian Foundation for BasicResearch (project No. 20-07-00092-a), by the Ministry of Science and Higher Education of theRussian Federation (gov. order FENU-2020-0022) and by the Government of the Russian Federationaccording to Act 211 (contract No. 02.A03.21.0011).REFERENCES
1. R. W. Cottle, J.-S. Pang, and R. E. Stone, The Linear Complementarity Problem. Society for Industrialand Applied Mathematics (2009). doi 10.1137/1.97808987190002. I. Sokolinskaya and L. B. Sokolinsky, “On the Solution of Linear Programming Problems in the Age of BigData,” Parallel Computational Technologies. PCT 2017. Communications in Computer and InformationScience 753, 86–100 (2017). doi 10.1007/978-3-319-67035-5_73. Y. Censor, T. Elfving, G. T. Herman, and T. Nikazad, “New Methods for Linear Inequalities,” SIAM J.on Scientific Computing 30 (1), 473–504 (2008). doi 10.1137/0506393994. J. Zhu and X. Li, “The Block Diagonally-Relaxed Orthogonal Projection Algorithm for CompressedSensing Based Tomography,” in 2011 Symposium on Photonics and Optoelectronics (SOPO), 1–4 (2011).doi 10.1109/SOPO.2011.57806605. Y. Censor, “Mathematical optimization for the inverse problem of intensity-modulated radiationtherapy,” in Intensity-Modulated Radiation Therapy: The State of the Art, J. R. Palta and T. R.Mackie, Eds. Madison, WI: Medical Physics Publishing, 25–49 (2003). doi 10.1118/1.16282796. S. Agmon, “The relaxation method for linear inequalities,” Canadian J. of Mathematics 6, 382–392(1954). doi 10.4153/CJM-1954-037-27. T. S. Motzkin and I. J. Schoenberg, “The relaxation method for linear inequalities,” Canadian J. ofMathematics 6, 393–404 (1954). doi 10.4153/CJM-1954-038-x8. Y. Censor and T. Elfving, “New methods for linear inequalities,” Linear Algebra and its Applications42, 199–211 (1982). doi 10.1016/0024-3795(82)90149-59. G. Cimmino, “Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari,” La RicercaScientifica, XVI, Series II, Anno IX, 1, 326–333 (1938).10. A. Galantai, Projectors and Projection Methods. Advances in Mathematics 6. Boston, MA: Springer US(2004). doi 10.1007/978-1-4419-9180-511. I. M. Sokolinskaya and L. B. Sokolinsky, “Scalability Evaluation of Cimmino Algorithm for SolvingLinear Inequality Systems on Multiprocessors with Distributed Memory,” Supercomputing Frontiersand Innovations 5 (2), 11–22 (2018). doi 10.14529/jsfi18020212. I. Sokolinskaya, “Parallel Method of Pseudoprojection for Linear Inequalities,” in Parallel ComputationalTechnologies. PCT 2018. Communications in Computer and Information Science 910, Cham: Springer,216–231 (2018). doi 10.1007/978-3-319-99673-8_1613. V. V. Vasin and I. I. Eremin, Operators and Iterative Processes of Fejer Type. Theory and Applications.Berlin, New York: Walter de Gruyter, (2009). doi 10.1515/978311021819014. I. I. Eremin, “Methods of Fejer’s approximations in convex programming,” Mathematical Notes of theAcademy of Sciences of the USSR 3 (2), 139–149 (1968). doi 10.1007/BF0109433615. L. B. Sokolinsky, “BSF-skeleton” [Online]. Available: https://github.com/leonid-sokolinsky/BSF-skeleton . Accessed 2020.16. P. Kostenetskiy and P. Semenikhina, “SUSU Supercomputer Resources for Industry and fundamentalScience,” IEEE (2018). doi 10.1109/GloSIC.2018.857006817. L. B. Sokolinsky, “Analytical Estimation of the Scalability of Iterative Numerical Algorithms onDistributed Memory Multiprocessors,” Lobachevskii J. of Mathematics 39 (4), 571–575 (2018). doi10.1134/S199508021804012118. N. A. Ezhova and L. B. Sokolinsky, “Scalability Evaluation of Iterative Algorithms Used for Supercom-puter Simulation of Physical processes,” IEEE (2018). doi 10.1109/GloSIC.2018.857013119. R. S. Bird, “Lectures on Constructive Functional Programming,” in Constructive Methods in ComputingScience. NATO ASI Series F: Computer and Systems Sciences 55, M. Broy, Ed. Berlin, Heidlberg:Springer, 151–216 (1988).20. S. Sahni and G. Vairaktarakis, “The master-slave paradigm in parallel computer and industrial settings,”J. of Global Optimization 9 (3–4), 357–377 (1996). doi 10.1007/BF0012167921. I. Sokolinskaya and L. Sokolinsky, “Revised Pursuit Algorithm for Solving Non-stationary LinearProgramming Problems on Modern Computing Clusters with Manycore Accelerators,” Supercomputing.RuSCDays 2016. Communications in Computer and Information Science 687, 212–223 (2016). doi10.1007/978-3-319-55669-7_17. Accessed 2020.16. P. Kostenetskiy and P. Semenikhina, “SUSU Supercomputer Resources for Industry and fundamentalScience,” IEEE (2018). doi 10.1109/GloSIC.2018.857006817. L. B. Sokolinsky, “Analytical Estimation of the Scalability of Iterative Numerical Algorithms onDistributed Memory Multiprocessors,” Lobachevskii J. of Mathematics 39 (4), 571–575 (2018). doi10.1134/S199508021804012118. N. A. Ezhova and L. B. Sokolinsky, “Scalability Evaluation of Iterative Algorithms Used for Supercom-puter Simulation of Physical processes,” IEEE (2018). doi 10.1109/GloSIC.2018.857013119. R. S. Bird, “Lectures on Constructive Functional Programming,” in Constructive Methods in ComputingScience. NATO ASI Series F: Computer and Systems Sciences 55, M. Broy, Ed. Berlin, Heidlberg:Springer, 151–216 (1988).20. S. Sahni and G. Vairaktarakis, “The master-slave paradigm in parallel computer and industrial settings,”J. of Global Optimization 9 (3–4), 357–377 (1996). doi 10.1007/BF0012167921. I. Sokolinskaya and L. Sokolinsky, “Revised Pursuit Algorithm for Solving Non-stationary LinearProgramming Problems on Modern Computing Clusters with Manycore Accelerators,” Supercomputing.RuSCDays 2016. Communications in Computer and Information Science 687, 212–223 (2016). doi10.1007/978-3-319-55669-7_17