Calculus, constrained minimization and Lagrange multipliers: Is the optimal critical point a local minimizer?
CCalculus, constrained minimization and Lagrange multipliers:Is the optimal critical point a local minimizer? ∗ Ademir Alves Ribeiro § Jos´e Renato Ramos Barbosa § April 11, 2019
Abstract
In this short note, we discuss how the optimality conditions for the problem of minimizinga multivariate function subject to equality constraints have been dealt with in undergraduateCalculus. We are particularly interested in the 2 or 3-dimensional cases, which are the mostcommon cases in Calculus courses. Besides giving sufficient conditions to a critical point to bea local minimizer, we also present and discuss counterexamples to some statements encounteredin the undergraduate literature on Lagrange Multipliers, such as “among the critical points, theones which have the smallest image (under the function) are minimizers” or “a single criticalpoint (which is a local minimizer) is a global minimizer”.
Keywords.
Calculus, constrained minimization, Lagrange multipliers, critical point, localminimizer, global minimizer.
In spite of being a strategy for finding the local maxima and/or minima of a function subjectto constraints, the Lagrange Multiplier Method (LMM), particularly when it is used for solvingundergraduate Optimization problems, is tipically used as a systematic procedure for identifyingglobal extrema. In doing so, some undergraduate textbooks on the subject show the statementsrelated to the LMM (and/or the worked problems based on it) without complete hypotheses orcarelessly written in an imprecise manner and with oversimplification (as an attempt to make themethod more palatable). Specifically, the validity of the LMM, when one is looking for globalextrema, depends on the existence of those extrema. This basic assumption has to be satisfiedbeforehand. Otherwise, even if one succeeded in obtaining local extrema from the critical pointsdetermined by the method, it might not be possible to get the global extrema from the local ones.However, it is not unusual to find books or academic homepages where, right after the local extremaare found, they are promptly evaluated and the ones with the smallest (respectively, the greatest)image under the function are elected the global minima (respectively, maxima). This is done evenwithout having been previously established the existence of the global extrema. The problem getsworse when there is just one critical point and one has to determine the maximum/minimum value ∗ This short note is for those who study or teach calculus. § Department of Mathematics, Federal University of Paran´a, Brazil (email: [email protected], [email protected]).The first author is supported by CNPq, Brazil, Grant 309437/2016-4. a r X i v : . [ m a t h . HO ] A p r rom it without having a local criterion to be used in conjuction with the LMM. We present such acriterion here and propose the adoption of it in Calculus courses. At least, such a procedure makesthe student completely sure that the optimal local values can be determined from the criticalpoints. Concerning the global aspect, well, this is a whole different story. Many times, withoutcompactness, the proof that there are global extrema is something out of the scope of Calculuscourses and depends on the specific problem that we are dealing with. On the other hand, evenin the very few books which correctly state the LMM, such as the excellent [3], emphasizing onlythe local aspect of the method, when working on problems with lack of compactness, it is assumed,for instance, the existence of “a box of largest possible volume” among all rectangular boxes witha fixed surface area. Then, right after finding a unique critical point via the LMM, the solutionends with a conclusion like this one: “This (cubical) shape must therefore maximize the volume,assuming there is a box of maximum volume.” In this paper we work on a similar problem, showingthe existence of the global extrema via a nontrivial reasoning for Calculus courses. Furthermore,we emphasize that a criterion to determine if a critical point (obtained by the LMM) is a localmaximum/minimum would help enormously in problems like those ones. We consider here the equality constrained optimization problem of the formminimize f ( x )subject to g ( x ) = 0 , (1)where f : D → R and g : D → R m are twice continuously differentiable functions defined on theopen set D ⊂ R n . The set Ω = { x ∈ D | g ( x ) = 0 } (2)is called feasible set of the problem (1). Recall the well known definition of a minimizer: Definition 2.1
A point x ∗ ∈ Ω is said to be a local minimizer of the problem (1) if there exists δ > such that f ( x ∗ ) ≤ f ( x ) for all x ∈ Ω ∩ B ( x ∗ , δ ) . If f ( x ∗ ) ≤ f ( x ) for all x ∈ Ω , the point x ∗ is called a global minimizer. Remark 2.2
There is no loss of generality in considering only minimization problems since if wewant to maximize a function f , we can equivalently minimize − f . So, the definitions and resultscan be easily rewritten.In the following example we can directly verify that a point is a minimizer. Nevertheless, this isnot always the case and we normally need other tools to find minimizers. It should be pointed outthat, in spite of focusing on constrained problems, some examples can be better understood and/orvisualized if we disregard the constraint. In fact, we can transform an unconstrained problem intoan equivalent constrained one by introducing an artificial variable:minimize f ( x ) is equivalent to minimize ϕ ( x, y ) := f ( x )subject to g ( x, y ) := y = 0 , Example 2.3
Consider the function f : R → R given by f ( x ) = x + x (1 − x ) . The point x ∗ = 0 is a local minimizer since f ( x ∗ ) = 0 ≤ f ( x ) for all x ∈ B ( x ∗ , . Moreover, this point is nota global minimizer because f (4 ,
1) = − . See Figure 1. f , given in Example 2.3, showing that the local minimizer is nota global minimizer.A well known condition that ensures the existence of a global minimizer is the compactness ofthe set. Theorem 2.4
Let L ⊂ Ω a compact set. Then f has a global minimizer in L , that is, there exists x ∗ ∈ L such that f ( x ∗ ) ≤ f ( x ) for all x ∈ L . Unfortunately, there are many situations where the above result cannot be directly applied sincethe underlying set might not be compact. Even so, we may still guarantee the existence of a globalminimizer, as discussed in the upcoming Example 3.3.
In this section we present the necessary conditions that must be satisfied by every local minimizer.We also point out two misunderstandings that sometimes arise in calculus courses.
Definition 3.1
A point x ∗ ∈ R n is said to be critical (or stationary) for the problem (1) if thereexists a vector λ ∗ ∈ R m such that ∇ f ( x ∗ ) + m (cid:88) i =1 λ ∗ i ∇ g i ( x ∗ ) = 0 , (3a) g ( x ∗ ) = 0 . (3b)The components of λ ∗ are the Lagrange multipliers associated with the constraints.The next result is a classical one and it is used to find possible candidates for the optimalsolutions. (See, for instance, [1] for a version of such theorem.)
Theorem 3.2
Suppose that x ∗ ∈ R n is a local minimizer for the problem (1) and the gradients ∇ g i ( x ∗ ) are linearly independent, i = 1 , . . . , m . Then x ∗ is a critical point for this problem.
3s previously mentioned, a very common exercise in undergraduate Calculus is the problem ofminimizing the area of a box, without lid, subject to a constant volume. The issue here rely onthe fact that, normally, the solution is not accompanied by a mathematical argument explainingthat the box with minimum area does exist and/or with a justification as to why the critical pointobtained (via the equations (3a)–(3b)) is the global minimizer of the problem. Let us discuss theseissues more precisely in the next example.
Example 3.3
Consider D = { x ∈ R | x > , x > , x > } and the functions f, g : D → R defined by f ( x ) = x x + 2 x x + 2 x x and g ( x ) = x x x − . Show that the problem (1) has a(unique) global solution and find it using the Lagrange method.Resolution . As defined before, let Ω = { x ∈ D | g ( x ) = 0 } be the feasible set of the problem. Weclaim that if x ∈ Ω and f ( x ) ≤
5, then ≤ x i ≤
5, for i = 1 , ,
3. Indeed, if x < or x < , then f ( x ) = x x + 2 x x + 2 x > x > f ( x ) = x x + 2 x + 2 x x > x > , respectively. If x < , then f ( x ) = 1 x + 2 x x + 2 x x > x >
5. Furthermore, if x > x > f ( x ) = x x + 2( x + x ) x x > x x + 10 x x = 1 x x (cid:32)(cid:18) x x − (cid:19) + 154 (cid:33) + 5 > . If x >
5, then f ( x ) = x + 2 x x x + 2 x x > x x + 2 x x = 2 x x (cid:32)(cid:18) x x − (cid:19) + 5516 (cid:33) + 5 > , proving the claim. Now, consider the set L = { x ∈ Ω | f ( x ) ≤ } . If ( x k ) ⊂ L is such that x k → ¯ x ,then ¯ x ∈ D , g (¯ x ) = 0 and f (¯ x ) ≤
5, which means that L is closed. It is also bounded in view of theclaim. Thus, Theorem 2.4 ensures that there exists x ∗ ∈ L such that f ( x ∗ ) ≤ f ( x ) for all x ∈ L .This point is in fact a global minimizer in Ω, because if x ∈ Ω \ L , we have f ( x ) > ≥ f ( x ∗ ).Now, applying Theorem 3.2, we conclude that x ∗ must be solution of the equations x + 2 x x + 2 x x + 2 x + λ x x x x x x = and x x x = 1 . Since this system has a unique solution, namely, 12 √ √ √ with λ = − √
4, it follows that x ∗ isexactly this point.Concerning the previous example, it is worth mentioning that, on the one hand, its reasoning isout of the scope of undergraduate Calculus textbooks. On the other hand, it is also a remainderthat, sometimes, it is not trivial to establish the existence of a global minimum.The next example is a reformulation of the unconstrained problem given in Example 2.3 as anequivalent constrained problem, obtained by introducing an artificial variable.4 xample 3.4 Consider the functions f, g : R → R given by f ( x ) = x + x (1 − x ) and g ( x ) = x .It is easy to verify that the point x ∗ = 0 and the multiplier λ ∗ = 0 satisfy the conditions (3a)–(3b).In fact, it can be proved that this point is the only critical point of this example. Moreover, x ∗ isa local minimizer since f ( x ∗ ) = 0 ≤ f ( x ) for all x ∈ B ( x ∗ , . However, this point is not a globalminimizer because f (4 , ,
0) = − . Remark 3.5
Note that the previous example also answers negatively the question “If a functionhas a single critical point which is a local minimizer, is this point a global minimizer?” ,which sometimes takes place (in some Calculus courses and academic homepages) and is respondedincorrectly with a “yes” . This probably occurs since, for functions of one variable, the result holdsas the next theorem states.
Theorem 3.6
Let f : ( a, b ) ⊂ R → R be a differentiable function with a single critical point x ∗ ∈ ( a, b ) . If x ∗ is a local minimizer, then it is a global minimizer.Proof . Assume by contradiction that there exists ¯ x ∈ ( a, b ), say, ¯ x > x ∗ , such that f (¯ x ) < f ( x ∗ ).Since x ∗ is a local minimizer, there exists δ > f ( x ∗ ) ≤ f ( x ) for all x ∈ ( x ∗ − δ, x ∗ + δ ). Infact, we have f ( x ∗ ) < f ( x ) for all x ∈ ( x ∗ − δ, x ∗ + δ ) \{ x ∗ } , because otherwise there would be anothercritical point of f , in view of Rolle’s theorem. Consider then ˜ x ∈ ( x ∗ , ¯ x ) with f ( x ∗ ) < f (˜ x ). So, theintermediate value theorem guarantees the existence of ˆ x ∈ (˜ x, ¯ x ) with f (ˆ x ) = f ( x ∗ ). Therefore,by the Rolle’s theorem, we conclude that exists a critical point x ∗∗ ∈ ( x ∗ , ˆ x ), contradicting thehypothesis. Figure 2 illustrates this proof.Figure 2: Illustration of the proof of Theorem 3.6.It is well known that the converse of Theorem 3.2 is not necessarily true. That is, the optimalityconditions (3a)–(3b) are not sufficient to ensure that the point is a local minimizer. Indeed, theseconditions are also satisfied at a maximizer. When dealing with unconstrained minimization in twovariables, we have the famous sufficient condition, present in almost all textbooks on the subject,to ensure that a critical point x ∗ is a local minimizer, namely, the test of second derivatives ∂ f∂x ( x ∗ ) > ∂ f∂x ( x ∗ ) ∂ f∂x ( x ∗ ) − (cid:18) ∂ f∂x ∂x ( x ∗ ) (cid:19) > . However, it is not so common to discuss a test for constrained optimization. In the next remarkwe address another issue related to this subject. 5 emark 3.7
In the context of problem (1), it is also typical the following question: “Amongthe critical points, is the one with the smallest image (under the function) a localminimizer?”.
Again, the answer is no and the example below shows why.
Example 3.8
Consider the problem (1) with the functions f, g : R → R defined by f ( x ) = 17 x − x + 5110 x − x + 92 x and g ( x ) = x . Find the critical points, its images and say which one is a minimizer.Resolution . The condition (3a) in this case is12 (cid:18) x ( x − ( x − x − (cid:19) + λ (cid:18) (cid:19) = (cid:18) (cid:19) , yielding λ ∗ = 0, ¯ x = 0, ˆ x = 1, x ∗ = and ˜ x = 3. So, we have four critical points¯ x = (cid:18) (cid:19) , ˆ x = (cid:18) (cid:19) , x ∗ = 32 (cid:18) (cid:19) and ˜ x = (cid:18) (cid:19) . By restricting the objective function to the feasible set, that is, to the points of the form x = (cid:18) t (cid:19) ,with t ∈ R , we obtain f ( x ) = ϕ ( t ), where ϕ ( t ) = 17 t − t + 5110 t − t + 92 t . (4)Since ϕ (cid:48) ( t ) = t ( t − ( t − t − x is neither maximizer nor minimizer, buta saddle point. The same is true for ˜ x . On the other hand, ˆ x is a local maximizer and x ∗ is a localminimizer. Finally, the critical values are f (¯ x ) = 0 , f (ˆ x ) ≈ . , f ( x ∗ ) ≈ . f (˜ x ) ≈ . . It should be noted that the smallest critical value does not correspond to a local minimizer andthat the greatest critical value does not correspond to a local maximizer. Figure 3 illustrates thisexample.The next section is devoted to discuss sufficient conditions to ensure optimality for constrainedoptimization problems.
In this section we present a criterion, based on the second derivatives, for attesting that a criticalpoint is a local minimizer. For completeness we present first a general result, well known in theoptimization community. Then, we particularize the test to the specific cases studied in Calculus.We stress that the criteria we will present are only local conditions and do not say anythingabout global minimization without additional assumptions or specific situations.To simplify the presentation consider the Lagrangian function associated with the problem (1),( x, λ ) ∈ R n × R m (cid:55)→ (cid:96) ( x, λ ) = f ( x ) + λ T g ( x ) . Figure 3: Illustration of Example 3.8. The left picture shows the graph of f and the curve onit corresponding to the feasible set. On the right, we have the graph of the auxiliary function ϕ ,defined in (4). Note that there are four critical points, being two of them local extrema. Besides,the smallest critical value does not correspond to a local minimizer.The Lagrangian Hessian, that is, the matrix of the second derivatives of (cid:96) with respect to x , isdenoted by ∇ xx (cid:96) ( x, λ ) = ∇ f ( x ) + m (cid:88) i =1 λ i ∇ g i ( x ) . The result below can be found in many optimization books. See, for example, [2, 4].
Theorem 4.1
Let x ∗ ∈ R n be a critical point for the problem (1) and λ ∗ ∈ R m a correspondingmultiplier vector, according to Definition 3.1. Suppose that d T ∇ xx (cid:96) ( x ∗ , λ ∗ ) d > for all nonzero vectors d ∈ R n satisfying ∇ g i ( x ∗ ) T d = 0 , i = 1 , . . . , m . Then there exist δ > anda neighborhood V of x ∗ such that f ( x ) − f ( x ∗ ) ≥ δ (cid:107) x − x ∗ (cid:107) for all x ∈ V with g ( x ) = 0 . In particular, x ∗ is a strict local minimizer of (1). Despite the existence of this condition for general dimensions, we consider here the particular2 and 3-dimensional cases with one or two constraints, which are the most common cases in theCalculus courses. In these situations, the Hessian matrices of a function ϕ are ∇ ϕ = ∂ ϕ∂x ∂ ϕ∂x ∂x ∂ ϕ∂x ∂x ∂ ϕ∂x or ∇ ϕ = ∂ ϕ∂x ∂ ϕ∂x ∂x ∂ ϕ∂x ∂x ∂ ϕ∂x ∂x ∂ ϕ∂x ∂ ϕ∂x x ∂ ϕ∂x ∂x ∂ ϕ∂x x ∂ ϕ∂x n = 2 or n = 3, respectively. Consider the problem (1) with n = 2 and m = 1. That is, the problem of minimizing a function oftwo variables subject to a single equality constraint.The next theorem follows immediately from the previous one. Theorem 4.2
Let x ∗ ∈ R be a critical point for the problem (1) and let λ ∗ ∈ R be the correspond-ing Lagrange multiplier. Define H = ∇ f ( x ∗ ) + λ ∗ ∇ g ( x ∗ ) , assume that ∇ g ( x ∗ ) (cid:54) = 0 and take anonzero vector v ⊥ ∇ g ( x ∗ ) . If v T Hv > , (5) then x ∗ is a local minimizer for the problem. Now, let us see a straighforward application of the previous theorem.
Example 4.3
Discuss the problem (1) with f, g : R → R defined by f ( x ) = x − x + x and g ( x ) = x − x .Resolution . The condition (3a) in this case is (cid:18) − x + 11 (cid:19) + λ (cid:18) − x (cid:19) = (cid:18) (cid:19) , giving λ ∗ = − x = −
13 or x = 1. So, we have two critical points x ∗ = 19 (cid:18) − (cid:19) and ¯ x = (cid:18) (cid:19) . Moreover, H = (cid:18) − x
00 0 (cid:19) and v = (cid:18) x (cid:19) ⊥ ∇ g ( x ) . Thus, x ∗ is a local minimizer since v T Hv = 2 − x ∗ >
0. Note that this point is not a globalminimizer because f (cid:18) (cid:19) = − < −
527 = f ( x ∗ ). Figure 4 illustrates this example. Remark 4.4
In the context of Theorem 4.2 we also have the maximization condition. If v T Hv < x ∗ is a local maximizer of f ( x ) subject to g ( x ) = 0. Consider the problem of minimizing a function of three variables subject to a single equality con-straint.Here comes another simple application of the general theorem:8
Figure 4: On the left, the level curves of f , given by x = x − x + constant , the feasible set, givenby x = x , and the two critical points x ∗ and ¯ x . On the right, the graph of f and the points on itcorresponding to the feasible set. Theorem 4.5
Let x ∗ ∈ R be a critical point for the problem (1) with n = 3 and m = 1 . Consider λ ∗ ∈ R the corresponding Lagrange multiplier and define H = ∇ f ( x ∗ ) + λ ∗ ∇ g ( x ∗ ) . Suppose that ∇ g ( x ∗ ) (cid:54) = 0 and take vectors v , v ∈ R such that span { v , v } = ∇ g ( x ∗ ) ⊥ . Consider the matrices V = ( v v ) ∈ R × and A = V T HV ∈ R × . If a > and det( A ) > , then x ∗ is a local minimizer for the problem. Let us see now a straighforward application of the previous theorem.
Example 4.6
Let us revisit Example 3.3. Suppose we have not proved the existence of a globalminimizer. Then, we are able to establish (only) the local condition.Resolution . We have the critical point x ∗ = 12 √ √ √ with multiplier λ ∗ = − √
4. Thus, H = − and ∇ g ( x ∗ ) = √ . We can consider, for example, ∇ g ( x ∗ ) ⊥ = span − , − , yielding A = − (cid:18) − − (cid:19) − − = (cid:18) (cid:19) x ∗ is a local minimizer since a = 2 > A ) = 12 > Consider the problem of minimizing a function of three variables subject to two equality constraints.Finally, let us present our last application of the general theorem.
Theorem 4.7
Let x ∗ ∈ R be a critical point for the problem (1) with n = 3 and m = 2 . Consider λ ∗ ∈ R the vector of Lagrange multipliers and define H = ∇ f ( x ∗ ) + λ ∗ ∇ g ( x ∗ ) + λ ∗ ∇ g ( x ∗ ) .Suppose that ∇ g ( x ∗ ) and ∇ g ( x ∗ ) are linearly independent and take a nonzero vector v ∈ R suchthat v ⊥ ∇ g ( x ∗ ) and v ⊥ ∇ g ( x ∗ ) . If v T Hv > , (6) then x ∗ is a local minimizer for the problem. Here comes our last example:
Example 4.8
Consider f, g , g : R → R defined by f ( x ) = x , g ( x ) = x + x − x and g ( x ) = x + x − . Solve the problem (1) for these functions.Resolution . The condition (3a) in this case is + λ x x − x + µ = , which immediately implies that λ ∗ (cid:54) = 0 and hence, x ∗ = 0. Using the constraints, we conclude that x ∗ = x ∗ = 1. This in turn implies that λ ∗ = 14 and µ ∗ = −
12 . So, H = 14 − , g ( x ∗ ) = − and ∇ g ( x ∗ ) = . We can consider, for example, v = ∈ {∇ g ( x ∗ ) , ∇ g ( x ∗ ) } ⊥ and see that v T Hv >
0, proving then that x ∗ is a local minimizer for the problem. In fact, we canprove that this point is a global minimizer. To see this, note that x = x + x = (2 − x ) + x = 4 − x + x + x , which gives 4( x −
1) = x ≥
0. So, any feasible point x satisfies f ( x ) = x ≥ f ( x ∗ ). Figure 5illustrates the feasible set of this example. Remark 4.9
It is easy to see that the condition (6) does not depend on the particular choice of v : if v T Hv > v ∈ {∇ g ( x ∗ ) , ∇ g ( x ∗ ) } ⊥ , then ¯ v T H ¯ v > v ∈ {∇ g ( x ∗ ) , ∇ g ( x ∗ ) } ⊥ . Indeed, in this case, ¯ v = αv , for some α ∈ R \ { } . The same reasoningis true for condition (5). It can be also proved that the conditions in Theorem 4.5 do not dependon the choice of vectors v , v such that span { v , v } = ∇ g ( x ∗ ) ⊥ .10igure 5: Illustration of Example 4.8: the cone given by x + x − x = 0, the plane defined by x + x − In this paper, we have pointed out that, in some examples (of some undergraduate Calculus text-books) related to the acquirement of global minimizers via the Lagrange Multiplier Method (LMM),a little bit of imprecision has been typical, particularly when dealing with worked problems. Oneway to mitigate that would be the use of a criterion to guarantee when a critical point (obtainedby the LMM) is a local minimizer. So, we have proposed such a criterion, which, by the way, hasbeen kept absent from Calculus textbooks. On the other hand, for those Professors who jump intothe ‘global’ aspects of the LMM, in spite of being a strictly local result, based on what we discussedhere, we also propose the following way to state the LMM:
For continuously differentiable functions, f and g , in order to determine the minimum value of f subject to the constraint g = k with k constant, assuming that this global minimum value is attainedon the interior of the domain shared by f and g , but not on the boundary of it, and that ∇ g (cid:54) = (cid:126) holds for that domain, do the following:1. Determine each point and, if necessary, also λ , satisfying the following system:(a) ∇ f = λ ∇ g ;(b) g=k.2. Evaluate f for those points obtained in the previous item: the smallest value of f is its globalminimum. References [1] P. M. Fitzpatrick.
Advanced Calculus . Thomson Brooks/Cole, Belmont, USA, 2nd edition,2006. 112] D. G. Luenberger and Y. Ye.
Linear and Nonlinear Programming . Springer, New York, 3rdedition, 2008.[3] J. E. Marsden and A. Tromba.
Vector Calculus . W. H. Freeman and Company, New York, 6thedition, 2012.[4] J. Nocedal and S. J. Wright.