A generalization of the steepest-edge rule and its number of simplex iterations for a nondegenerate LP
aa r X i v : . [ m a t h . O C ] M a r A generalization of the steepest-edge rule andits number of simplex iterations for a nondegenerate LP
Masaya Tano a, ∗ , Ryuhei Miyashiro b , Tomonari Kitahara c a Graduate School of Engineering,Tokyo University of Agriculture and Technology,2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan b Institute of Engineering,Tokyo University of Agriculture and Technology,2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan c Department of Industrial Engineering and Economics,Tokyo Institute of Technology,2-12-1-W9-62 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
Abstract
In this paper, we propose a p -norm rule, which is a generalization of the steepest-edge rule, as a pivoting rule for the simplex method. For a nondegenerate linearprogramming problem, we show upper bounds for the number of iterations ofthe simplex method with the steepest-edge and p -norm rules. One of the upperbounds is given by a function of the number of variables, that of constraints,and the minimum and maximum positive elements in all basic feasible solutions. Keywords:
Linear programming, Simplex method, Steepest-edge rule, p -normrule, Simplex iteration
1. Introduction
The simplex method, an algorithm for a linear programming problem (LP),has been improved since Dantzig [1] developed it. The primal simplex methodstarts from an initial basic feasible solution (BFS) and repeats exchanging abasic variable with a nonbasic variable until the solution satisfies the optimalitycondition. A variable changed from a nonbasic to a basic variable is called anentering variable, while one for the other direction is called a leaving variable.A pivoting rule is a criterion to choose entering and leaving variables. Theoldest rule is the most negative coefficient rule, which was developed by Dantzig.It is well known that the simplex method with the most negative coefficientrule is not a polynomial-time algorithm [8]. Various pivoting rules have beenproposed so far (cf. [12]), and they have great influence on the number of simplexiterations. ∗ Corresponding author. Tel.: +81-42-388-7451.
Email address: [email protected] (Masaya Tano)
Preprint submitted to a Journal October 19, 2018 owadays there are pivoting rules that spend fewer simplex iterations inpractice than the most negative coefficient rule. The steepest-edge rule [9, 13]is such a pivoting rule. However, even for the steepest-edge rule, there is an LPin which the number of iterations is exponential with respect to the number ofvariables [4]. It is a long-standing open question in linear programming whetherthere exists a pivoting rule or a variant of the simplex method that can solveany LP in polynomial time.Some recent studies analyzed the simplex method from a different perspec-tive. Kitahara and Mizuno [6] showed upper bounds for the number of differentBFSs generated by the simplex method with the most negative coefficient rule.These bounds are given as follows: & mγδ log c ⊤ x − z ∗ c ⊤ ¯ x − z ∗ !' and ( n − m ) l mγδ log (cid:16) mγδ (cid:17)m , where m is the number of constraints, n is that of variables, x is an initial BFS,¯ x is a BFS with the second smallest objective value, z ∗ is the optimal value,and γ and δ are the maximum and minimum positive elements in all BFSs,respectively. On the assumption that an LP is nondegenerate, these boundsyield upper bounds for the number of iterations of the simplex method with themost negative coefficient rule and that with the best improvement rule.The contribution of this paper is to expand the research in [6] into thesteepest-edge rule. We first propose a p -norm rule as a pivoting rule for thesimplex method. This rule is a generalized steepest-edge rule and coincideswith the steepest-edge when p = 2. Next we prove that the simplex methodwith the p -norm rule finds an optimal solution in at most & m p γ δ log c ⊤ x − z ∗ c ⊤ ¯ x − z ∗ !' iterations for a nondegenerate LP. We also show that an upper bound for thenumber of iterations can be expressed as a form independent of the objectivevalue, ( n − m ) (cid:24) m p γ δ log (cid:16) mγδ (cid:17)(cid:25) . Finally, we prove that, for an LP formulation of the discounted Markov decisionproblem with a fixed discount factor, the simplex method with the p -norm ruleis a strongly polynomial-time algorithm.The rest of this paper is organized as follows. Section 2 explains somepreliminaries and previous research. Section 3 proposes the p -norm rule asa generalization of the steepest-edge rule. In Section 4, we show two upperbounds for the number of iterations of the simplex method with the p -normrule. Section 5 presents the application of our results to the discounted Markovdecision problem. In Section 6, we summarize the result of the analysis anddiscuss our future work. 2 . Preliminaries and previous research In this section, we first review linear programming and the simplex method.Next, we define the most negative coefficient rule as a pivoting rule for thesimplex method. Finally, we explain some previous studies about the numberof iterations of the simplex method with the most negative coefficient rule.
We consider an LP with n nonnegative variables and m constraints:minimize c ⊤ x subject to Ax = b , x ≥ , (1)where A ∈ R m × n , b ∈ R m , and c ∈ R n are given data and x ∈ R n is a variablevector. The dual problem of (1) is expressed as follows:maximize b ⊤ y subject to A ⊤ y + s = c , s ≥ , (2)where y ∈ R m and s ∈ R n are variable vectors. The problem (1) is called theprimal problem with respect to the dual problem (2).The duality theorem is a well-known relationship between a primal and adual problem. Theorem 1 (Duality Theorem) . If the primal problem (1) has an optimalsolution, so does the dual problem (2) and these optimal values are equal. Inother words, if x ∗ is a primal optimal solution, there is a dual optimal solution( y ∗ , s ∗ ) such that c ⊤ x ∗ = b ⊤ y ∗ . In addition, between a primal feasible solution x and a dual feasible solution( y , s ), the following relationship holds: c ⊤ x − b ⊤ y = x ⊤ s . (3)Next, we define a dictionary. Let { B, N } be a partition of the index set ofvariables { , , . . . , n } . We can split A , c , and x corresponding to { B, N } asfollows: A = (cid:2) A B A N (cid:3) , c = (cid:20) c B c N (cid:21) , x = (cid:20) x B x N (cid:21) . By these splits, the problem (1) is written asminimize c ⊤ B x B + c ⊤ N x N subject to A B x B + A N x N = b , x = ( x B , x N ) ≥ . (4)3f | B | = m and A B is nonsingular, B and N = { , , . . . , n } \ B are called abasis and a nonbasis, respectively.The problem (4) is transformed by multiplying the equality constraint by A − B from left: minimize z + ¯ c ⊤ N x N subject to x B = ¯ b − ¯ A N x N , x = ( x B , x N ) ≥ , (5)where z = c ⊤ B A − B b , ¯ c N = c N − A ⊤ N ( A ⊤ B ) − c B , ¯ b = A − B b , and ¯ A N = A − B A N .The form (5) is called a dictionary for a basis B . The vector ¯ c N is called areduced cost vector and ¯ A N is called a nonbasic matrix. A basic solution is asolution x such that ( x B , x N ) = (¯ b , ) in the dictionary (5). The elements of x B are basic variables and those of x N are nonbasic variables. A basic solution x iscalled a BFS (basic feasible solution) if x B ≥ . A basis B and a nonbasis N arerespectively called a feasible basis and a feasible nonbasis if the correspondingbasic solution is feasible. For a given BFS in the dictionary (5), if ¯ c N ≥ , then the BFS is optimal;otherwise, increasing the value of a nonbasic variable with a negative reducedcost decreases the objective value. We explain this procedure—the simplexmethod.We use the following notations unless otherwise stated: x B = x i x i ... x i m , x N = x j x j ... x j ℓ , ¯ b = ¯ b ¯ b ...¯ b m , ¯ c N = ¯ c ¯ c ...¯ c ℓ , ¯ A N = ¯ a ¯ a · · · ¯ a ℓ ¯ a ¯ a · · · ¯ a ℓ ... ... . . . ...¯ a m ¯ a m · · · ¯ a mℓ , where ℓ = n − m . By using this notation, the dictionary (5) is expressed asfollows:minimize z + ¯ c x j + · · · + ¯ c s x j s + · · · + ¯ c ℓ x j ℓ subject to x i = ¯ b − ¯ a x j − · · · − ¯ a s x j s − · · · − ¯ a ℓ x j ℓ , ... x i r = ¯ b r − ¯ a r x j − · · · − ¯ a rs x j s − · · · − ¯ a rℓ x j ℓ , ... x i m = ¯ b m − ¯ a m x j − · · · − ¯ a ms x j s − · · · − ¯ a mℓ x j ℓ ,x i , x i , . . . , x i m , x j , x j , . . . , x j ℓ ≥ . Let x j s be a nonbasic variable having a negative reduced cost ¯ c s . By increas-ing the value of x j s from 0 to θ s >
0, the objective value decreases by − ¯ c s θ s > x i k changes from ¯ b k to ¯ b k − ¯ a ks θ s . Set θ s as follows: θ s = min (cid:26) ¯ b k ¯ a ks (cid:12)(cid:12)(cid:12)(cid:12) ¯ a ks > k = 1 , , . . . , m (cid:27) . (6)Let r be an index k ∈ { , , . . . , m } attaining the minimum in the equality (6).By the definition of θ s and r , the equality x i r − ¯ a rs θ s = 0 holds. The basicvariable x i r therefore leaves a basis (enters a nonbasis), and the nonbasic vari-able x j s accordingly enters a basis. The variable changed from a basis to anonbasis is called a leaving variable, and the one changed from a nonbasis to abasis is called an entering variable. In such a way, a basic variable is exchangedwith a nonbasic variable in each iteration of the simplex method.A rule to choose an entering variable from nonbasic variables is called apivoting rule. Many pivoting rules have been proposed because they have greatinfluence on the number of iterations of the simplex method. In this section, wedescribe the most negative coefficient rule, i.e., Dantzig’s original one, and thebest improvement rule.The most negative coefficient rule chooses a nonbasic variable with the small-est reduced cost as an entering variable. That is, this rule focuses on a variablethat maximizes the decrease in the objective value per unit increase. In otherwords, for a given feasible dictionary (5), the rule chooses an entering variable x j d such that d ∈ arg min k { ¯ c k | k = 1 , , . . . , ℓ } . The best improvement rule is to choose, in each iteration, a nonbasic vari-able that decreases the objective value most. The simplex method with thebest improvement rule needs fewer simplex iterations than that with the mostnegative coefficient rule in practice [11]; however, the simplex method with thebest improvement rule is an exponential-time algorithm in the worst case [5], aswell as that with the most negative coefficient rule [8].
Although a number of pivoting rules have been proposed, it is still an openproblem whether there exists a pivoting rule that can solve any LP in polynomialtime. Recently, some research paid attention to the number of different BFSsgenerated by the simplex method to consider the complexity.Kitahara and Mizuno [6] showed an upper bound for the number of differentBFSs generated by the simplex method with the most negative coefficient rule.They proved that the number of different BFSs generated by the simplex methodis at most (cid:24) mγδ log (cid:18) c ⊤ x − z ∗ c ⊤ ¯ x − z ∗ (cid:19)(cid:25) (7)for a standard LP with n variables and m constraints (see Section 1 or Table 1in Section 4.1 for the notations in (7) and (8)). Moreover, another upper boundindependent of the objective value is given by( n − m ) l mγδ log (cid:16) mγδ (cid:17)m . (8)5ote that, for a nondegenerate LP the upper bounds (7) and (8) for the numberof different BFSs can be regarded as upper bounds for the number of iterationsof the simplex method with the most negative coefficient rule. In addition,they showed that, for a nondegenerate LP the bounds (7) and (8) are alsoupper bounds for the number of iterations of the simplex method with the bestimprovement rule.Kitahara and Mizuno [7] also studied any pivoting rule that does not increasethe objective value in each iteration. Using such a pivoting rule, the simplexmethod generates different BFSs at most (cid:24) min { m, n − m } γ P γ ′ D δ P δ ′ D (cid:25) , (9)where γ P and δ P are respectively the maximum and minimum positive elementsin all BFSs, and γ ′ D and δ ′ D are respectively the maximum and minimum abso-lute values of negative reduced costs in all BFSs. On nondegeneracy assumptionof LPs, the upper bound (9) for the number of different BFSs can be regardedas that for the number of simplex iterations. p -norm rule: generalization of steepest-edge We explain the steepest-edge rule in Section 3.1, and then propose a p -normrule in Section 3.2, which is a generalization of the steepest-edge rule. According to Forrest and Goldfarb [2], efficiency of the steepest-edge rulewas reported for the first time in the computational experiment by Kuhn andQuandt [9] and that by Wolfe and Cutler [13]; several computational experi-ments showed that the steepest-edge rule needs fewer simplex iterations thanthe most negative coefficient rule (e.g., [3, 11]) and than the best improvementrule (e.g., [11]).The steepest-edge rule considers the amount of the change of all variablesin contrast to the most negative coefficient rule. That is, this rule focuses onthe difference vector of basic solutions and the decrease in the objective value.When a nonbasic variable x j k increases by θ k > − ¯ c k θ k > x i u changes from ¯ b u to ¯ b u − ¯ a uk θ k . Hence, the norm of the difference vector of thetwo solutions before and after increasing x j k is vuut θ k + m X i =1 (¯ a ik θ k ) = θ k vuut m X i =1 ¯ a ik . Therefore the decrease in the objective value per unit length of the differencevector is expressed as − ¯ c k θ k θ k q P mi =1 ¯ a ik = − ¯ c k q P mi =1 ¯ a ik . x j s is chosen as an entering variablesuch that the following is satisfied: s ∈ arg min k ¯ c k q P mi =1 ¯ a ik (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k = 1 , , . . . , ℓ . p -norm rule Let ( v N ) k be k -th column vector of an n × ℓ matrix V N , where V N = (cid:20) − ¯ A N I (cid:21) and I is the ℓ × ℓ identity matrix. By this notation, the condition that anentering variable x j s must satisfy for the steepest-edge rule is written as s ∈ arg min k ( ¯ c k k ( v N ) k k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k = 1 , , . . . , ℓ ) . (10)Here we propose a p -norm rule as a pivoting rule for the simplex method.This rule is a generalized steepest-edge where the norm is changed from 2-normto p -norm in (10). The p -norm rule selects a nonbasic variable x j s that satisfies s ∈ arg min k ( ¯ c k k ( v N ) k k p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k = 1 , , . . . , ℓ ) as an entering variable.
4. Analysis of the number of iterations
In this section, we show upper bounds for the number of iterations of thesimplex method with the p -norm rule. When p = 2, these upper bounds areones for the steepest-edge rule. For our analysis, we assume the following: • rank A = m ; • An initial BFS x is available; • The problems (1) and (2) have optimal basic solutions, denoted by x ∗ and( y ∗ , s ∗ ), respectively, and these optimal values are z ∗ ; • An initial BFS x is not optimal, i.e., its objective value is larger than z ∗ ;7 able 1: Notations m the number of constraints n the number of variables ℓ the constant equal to n − mγ the maximum positive element in all BFSs δ the minimum positive element in all BFSs z ∗ the optimal value x an initial BFS¯ x a second optimal BFS, i.e., a BFS with the second smallest objective value( v N ) k k -th column vector of an n × ℓ matrix V N = (cid:20) − ¯ A N I (cid:21) . B the set consisting of all feasible bases N the set consisting of all feasible nonbases, i.e., N = { N | B ∈ B , N = { , , . . . , n } \ B }• The problem (1) is nondegenerate, i.e., each basic variable has a positivevalue in any BFS.The first three assumptions are the same as the previous study [6]. Moreover,the fourth assumption was also imposed in [6] implicitly.In addition to the notations defined earlier, let B and N be the set of all fea-sible bases and all feasible nonbases, respectively. We summarize the notationsin Table 1. We now start our analysis of the p -norm rule from the following lemma abouta lower bound for the optimal value. Lemma 2 (Kitahara and Mizuno [6]) . Let x t be the t -th solution generated bythe simplex method with the most negative coefficient rule and let B t and N t be the corresponding basis and nonbasis to x t , respectively. Moreover, set∆ td = − min { ¯ c j | j = 1 , , . . . , ℓ } . Then we have z ∗ ≥ c ⊤ x t − mγ ∆ td . (11) Proof.
Let x ∗ be a basic optimal solution of the problem (1). Then we obtain z ∗ = c ⊤ x ∗ = c ⊤ B t A − B t b + ¯ c ⊤ N t x ∗ N t ≥ c ⊤ x t − ∆ td e ⊤ x ∗ N t ≥ c ⊤ x t − mγ ∆ td . The second inequality holds since x ∗ has m positive elements and each elementis bounded above by γ . Thus, we have the inequality (11).8his lemma was proven by Kitahara and Mizuno [6]. In their paper, theyanalyzed the simplex method with the most negative coefficient rule. However,as above the proof is not based on a specific property of any pivoting rule.Accordingly, Lemma 2 also holds for the simplex method with the p -norm rule.Assume that a feasible dictionary (5) is given at the t -th iteration of thesimplex method. Let x j s and x j d be entering variables chosen by the p -normand most negative coefficient rules, respectively. Then, by the definition of the p -norm rule, we have ¯ c s k ( v N ) s k p ≤ ¯ c d k ( v N ) d k p . (12)Due to the inequality (12), we obtain∆ s ≥ ∆ d k ( v N ) s k p k ( v N ) d k p , where ∆ s = − ¯ c s and ∆ d = − ¯ c d . We represent k ( v N ) s k p / k ( v N ) d k p as q N andset q = min { q N | N ∈ N } . Then, ∆ s ≥ q ∆ d (13)holds for all feasible nonbases. We will analyze the detail of q in Section 4.4.Next, we show that, in each iteration of the simplex method with the p -normrule, the difference between the objective value and the optimal value decreasesat a constant ratio or more. Lemma 3.
Let x t and x t +1 be the t -th and ( t + 1)-th solutions of the simplexmethod with the p -norm rule, respectively. Then the following inequality holds: c ⊤ x t +1 − z ∗ ≤ (cid:18) − qδmγ (cid:19) (cid:0) c ⊤ x t − z ∗ (cid:1) . (14) Proof.
The objective value decreases by ∆ s x t +1 j s at the iteration. Moreover,by the definition of δ and γ , x j > ⇒ δ ≤ x j ≤ γ ( j = 1 , , . . . , n )holds for any BFS x . Thus, we obtain the following inequality: c ⊤ x t − c ⊤ x t +1 = ∆ s x t +1 j s ≥ ∆ s δ. By the inequalities (11) and (13), we have∆ s δ ≥ qδ ∆ d ≥ qδ · c ⊤ x t − z ∗ mγ . Hence, we obtain c ⊤ x t − c ⊤ x t +1 ≥ qδmγ (cid:0) c ⊤ x t − z ∗ (cid:1) , which leads to the inequality (14). 9pplying the inequality (14) from t = 0 , , , . . . in order, we have c ⊤ x t − z ∗ ≤ (cid:18) − qδmγ (cid:19) t (cid:0) c ⊤ x − z ∗ (cid:1) . (15)Let ¯ x be a second optimal BFS, that is, whose objective value is the smallestexcept that of the optimal solution. If the t -th solution x t satisfies the followinginequality, it is optimal: c ⊤ x t − z ∗ < c ⊤ ¯ x − z ∗ . The simplex method with the p -norm rule therefore finds an optimal solutionand terminates after T iterations starting from an initial BFS x , where T isthe smallest integer t such that the right-hand side value in the inequality (15)is less than c ⊤ ¯ x − z ∗ . As discussed above, we have the following theorem. Theorem 4.
Let ¯ x be a BFS of the problem (1) whose objective value is thesecond smallest. The simplex method with the p -norm rule finds an optimalsolution in at most & mγqδ log c ⊤ x − z ∗ c ⊤ ¯ x − z ∗ !' iterations for the problem (1). Proof.
As mentioned earlier, the smallest integer t satisfying the followinginequality is an upper bound for the number of iterations: (cid:18) − qδmγ (cid:19) t (cid:0) c ⊤ x − z ∗ (cid:1) < c ⊤ ¯ x − z ∗ . Solving this inequality for t , we have t > log c ⊤ x − z ∗ c ⊤ ¯ x − z ∗ ! − log (cid:18) − qδmγ (cid:19) . From 1 x > − log (1 − x ) holding in 0 < x <
1, we obtain mγqδ > − (cid:18) − qδmγ (cid:19) , which leads to the theorem. 10 .3. Upper bound independent of objective value The upper bound in Section 4.2 is dependent on the objective value. Weturn to obtain another upper bound independent of the objective value.First we consider the following lemma.
Lemma 5 (Kitahara and Mizuno [6]) . Let x t and B t be the t -th solution ofthe simplex method and the corresponding basis to x t , respectively. If x t is notoptimal, there exists ¯ ∈ B t that satisfies the following conditions: x t ¯ > s ∗ ¯ ≥ mx t ¯ (cid:0) c ⊤ x t − z ∗ (cid:1) , (16)where s ∗ is the slack vector of an optimal basic solution of the dual problem (2).Furthermore, the k -th solution x k satisfies x k ¯ ≤ mx t ¯ c ⊤ x k − z ∗ c ⊤ x t − z ∗ for arbitrary positive integer k . Proof.
We first prove the former. From the equation (3), we have c ⊤ x t − z ∗ = c ⊤ x t − b ⊤ y ∗ = ( x t ) ⊤ s ∗ = X j ∈ B t x tj s ∗ j . Since x t ≥ , s ∗ ≥ , and | B t | = m , there exists ¯ ∈ B t such that x t ¯ s ∗ ¯ ≥ m (cid:0) c ⊤ x t − z ∗ (cid:1) . Here x t is not optimal and thus the right-hand side is positive. Hence, x t ¯ > k , c ⊤ x k − z ∗ = ( x k ) ⊤ s ∗ = n X j =1 x kj s ∗ j . In addition, x kj ≥ s ∗ j ≥ j = 1 , , . . . , n ) hold, and thus we obtain c ⊤ x k − z ∗ ≥ x k ¯ s ∗ ¯ . Using this and the inequality (16), we have x k ¯ ≤ c ⊤ x k − z ∗ s ∗ ¯ ≤ mx t ¯ c ⊤ x k − z ∗ c ⊤ x t − z ∗ . p -normrule.We now have an upper bound independent of the objective value. Theorem 6.
When applying the simplex method with the p -norm rule to theproblem (1), the number of iterations is at most( n − m ) (cid:24) mγqδ log (cid:16) mγδ (cid:17)(cid:25) . Proof.
Let r be an integer that is greater than or equal to 1. Moreover, let x t and x t + r be the t -th and ( t + r )-th solutions of the simplex method, respectively.In addition, let B t be the corresponding basis of x t . Then, by Lemmas 3 and5, and the definition of γ , there exists ¯ ∈ B t such that x t + r ¯ ≤ mx t ¯ (cid:18) − qδmγ (cid:19) r ≤ mγ (cid:18) − qδmγ (cid:19) r . Thus, when r ≥ ( mγ ) / ( qδ ) · log ( mγ/δ ), the rightmost term is less than δ , and x t + r ¯ is fixed to 0 by the definition of δ .If an optimal solution is not obtained after r iterations satisfying the aboveinequality, we can apply the same procedure again. The number of variablesthat can be chosen as a basis decreases by one through each process. Due tothe nondegeneracy assumption, the number of positive elements in any basicfeasible solution is m , and thus this process occurs at most n − m times. Hence,we have the desired result. q The ratio q is contained in the upper bounds in Theorems 4 and 6. In thissection, we analyze a lower bound for q to make these upper bounds clearer.The definition of q is as follows: q = min { q N | N ∈ N } , q N = k ( v N ) s k p k ( v N ) d k p . Hence, a lower bound for q can be derived from a lower bound of k ( v N ) s k p divided by an upper bound of k ( v N ) d k p .Let x j k and x i r be entering and leaving variables when the solution x t changes to x t +1 at the t -th iteration of the simplex method, respectively. Set w = x t +1 − x t . Since nonbasic variables except x j k is unchanged through theiteration, we have k w k pp = | w i | p + | w i | p + · · · + | w i m | p + | w j k | p . By the definition of γ and the nonnegative constraints of variables, any elementof w is in the range − γ to γ . That is, | w j | ≤ γ ( j = 1 , , . . . , n ) holds. Thus,we obtain the following inequality: | w j k | p + | w i r | p ≤ k w k pp ≤ | w j k | p + | w i r | p + ( m − γ p . δ ≤ | w i r | ≤ γ holds due to the nondegeneracy assumption, wehave | w j k | p + δ p ≤ k w k pp ≤ | w j k | p + mγ p . (17)As mentioned in Section 3.1, each basic variable changes − ¯ a ik θ k ( i = 1 , , . . . , m )when x j k increases by θ k = | w j k | . Thus, the p -norm of the difference vector w is k w k p = | w j k | p + m X i =1 | − ¯ a ik w j k | p ! /p = | w j k | (1 + | ¯ a ik | p ) /p . Using the notation introduced in Section 3.2, the equality can be expressed as k w k p = | w j k | · k ( v N ) k k p . Dividing the both sides of the inequality (17) by | w j k | p >
0, we obtain1 + δ p | w j k | p ≤ k ( v N ) k k pp ≤ m γ p | w j k | p . The relationship δ ≤ | w j k | ≤ γ gives upper and lower bounds for k ( v N ) k k pp :1 + δ p γ p ≤ k ( v N ) k k pp ≤ m γ p δ p . (18)Let x j d and x j s be nonbasic variables chosen by the most negative coefficientand p -norm rules, respectively. From the inequality (18), we have k ( v N ) s k pp k ( v N ) d k pp ≥ δ/γ ) p m ( γ/δ ) p ≥ δ/γ ) p m + m ( γ/δ ) p ≥ δ p mγ p . Therefore ( δ/γ ) m − /p is a lower bound for q . Applying this bound to Theo-rems 4 and 6, the following theorems are immediately obtained. Theorem 7.
The simplex method with the p -norm rule finds an optimalsolution in at most & m p γ δ log c ⊤ x − z ∗ c ⊤ ¯ x − z ∗ !' iterations for the problem (1). Theorem 8.
The simplex method with the p -norm rule finds an optimalsolution in at most ( n − m ) (cid:24) m p γ δ log (cid:16) mγδ (cid:17)(cid:25) . iterations for the problem (1). 13 . Application to Markov decision problem Ye [15] proved that the simplex method with the most negative coefficientrule is a strongly polynomial-time algorithm for solving the discounted Markovdecision problem (DMDP) with a fixed discount factor; this result motivatedthe research on the number of different BFSs by Kitahara and Mizuno [6, 7],which we mentioned in Section 2.3.On the other hand, the simplex method with the smallest index rule takesexponential time to solve the DMDP regardless of discount factors [10].These results imply that pivoting rules are crucial for the complexity to solvethe DMDP. Since the DMDP satisfies the assumptions in Section 4.1, we canapply the upper bound in Theorem 8 to analyze the DMDP.Here we prove that the simplex method with the p -norm rule is also a stronglypolynomial-time algorithm for the DMDP with a fixed discount factor. The LPformulation and some properties of the DMDP are based on those given byYe [15].The DMDP can be formulated as the following LP:minimize c ⊤ x subject to ( E − θP ) x = e , x ≥ , (19)where θ ∈ [0 ,
1) is the discount factor, e ∈ R m is the vector of all ones, E ∈ R m × n is a 0-1 matrix that indicates whether the action j can be chosen in the state i , c ∈ R n is a vector calculated by immediate costs, and P ∈ R m × n consists oftransition probabilities.Let x be a BFS of the problem (19). Ye [14] proved that if x j is a basicvariable, then x j satisfies 1 ≤ x j ≤ m − θ . This inequality implies that the minimum and maximum values of all the pos-itive elements of BFSs are not less than 1 and not more than m/ (1 − θ ), re-spectively; in other words, δ ≥ γ ≤ m/ (1 − θ ) hold. Moreover, theproblem (19) is nondegenerate. We therefore obtain the following theorem. Theorem 9.
Consider the DMDP with m states, n actions, and a discountfactor θ formulated as the problem (19). The simplex method with the p -normrule takes at most ( n − m ) & m p (1 − θ ) log (cid:18) m − θ (cid:19)' iterations to solve the problem (19), and is a strongly polynomial-time algorithmfor the DMDP with a fixed discount factor.14 . Discussion and future work In this paper, we proposed the p -norm rule as a pivoting rule for the simplexmethod, which is a generalization of the steepest-edge rule. In addition, weshowed two upper bounds for the number of iterations taken by the simplexmethod with the p -norm rule for a nondegenerate LP. One of the upper boundsis expressed as & m p γ δ log c ⊤ x − z ∗ c ⊤ ¯ x − z ∗ !' , which depends on the second optimal solution (Theorem 7); the other is( n − m ) (cid:24) m p γ δ log (cid:16) mγδ (cid:17)(cid:25) , which is independent of the objective value (Theorem 8).The discounted Markov decision problem satisfies the assumptions we needfor analysis; our results proved that the simplex method with the p -norm ruleis a strongly polynomial-time algorithm for solving the problem with a fixeddiscount factor.There are some further directions for this study. Firstly, it is known that thesteepest-edge rule takes fewer iterations than the most negative coefficient ruleand than the best improvement rule by computational experiments. However,the upper bounds obtained in this paper are larger than that for these rules,( n − m ) l mγδ log (cid:16) mγδ (cid:17)m , obtained by Kitahara and Mizuno [6]. Thus, to find a better upper bound is afurther direction. Secondly, there remains a matter to be discussed whether thesimplex method with p -norm rule is strongly polynomial-time for solving thediscounted Markov decision problem regardless of discount factors. Yet anotherfuture work is to remove the nondegeneracy assumption of LPs for the p -normrule. Acknowledgement
This work was partially supported by JSPS KAKENHI Grant NumbersJP17K01246 and JP15K15941.