Towards the theory of strong minimum. A view from variational analysis
aa r X i v : . [ m a t h . O C ] J un Towards the theory of strong minimum.A view from variational analysis
A.D. Ioffe ∗ June 26, 2019
Abstract . The key element of the approach to the theory of necessary conditionsin optimal control discussed in the paper is reduction of the original constrainedproblem to unconstrained minimization with subsequent application of a suitablemechanism of local analysis to characterize minima of (necessarily nonsmooth) func-tionals that appear after reduction. Using unconstrained minimization at the crucialstep of obtaining necessary conditions facilitates studies of new phenomena and al-lows to get more transparent and technically simple proofs of known results. Inthe paper we offer a new proof of the maximum principle for a nonsmooth optimalcontrol problem (in the standard Pontryagin form) with state constraints and thenprove a new second order condition for a strong minimum in the same problembut with data differentiable in control and state variables. The role of variationalanalysis is twofold. Conceptually, the main considerations behind the reduction areconnected with metric regularity and Ekeland’s principle. On the other hand, thesubdifferential calculus offers the main technical instrument for proofs of first orderconditions.
60 years ago the appearance of the book by Pontriagin, Boltyanskii, Gamkrelidzeand Mischenko [22] stimulated a series of studies aimed at finding general approachesto analysis of necessary conditions in constrained optimization. I just mention twobasic ideas that played fundamental role in subsequent developments. Accordingto the first, proposed by Dubovitzkii and Milyutin [7], under suitable conditionsthe cones of variations of the constraint sets and the cost functional must haveempty intersection. The second idea expressed in the clearest form by Gamkrelidze[9], was that the image of the solution under a mapping naturally associated with ∗ Mathematics, the Technion . On the contrary, the boundary point approach (with variational techniquesreplaced by subdifferential calculus and nonsmooth controllability criteria) was suc-cessfully used to extend the maximum principle to optimal control problems withnonsmooth data (see e.g. [5, 6, 10]) but does not seem to work well with higherorder conditions .In this paper we shall discuss (mainly in connection with optimal control) atotally different, non-variational approach to the study of necessary conditions inconstrained optimization. It takes its origin in the metric regularity (or rather metricsubregularity) property which is one of the most fundamental concepts of today’svariational analysis. The key element of the approach is reduction of the originalconstrained problem to unconstrained minimization with subsequent applicationof a suitable mechanism of local analysis to characterize minima of (necessarilynonsmooth) functionals that appear after reduction. The possibility to work withunconstrained minimization at the crucial step of obtaining necessary conditions isthe principal advantage of the new theory. In an almost obvious way, it opens doorsto both the study of second order conditions or to work with first order conditions fornonsmooth problems, sometimes with substantial simplification of arguments. Wealso hope and believe that the approach will be equally applicable to other problemsof the theory of optimal control, not considered in this paper.The power of the theory was already demonstrated in several earlier publicationsdevoted to the maximum principle for optimal control of systems governed by dif-ferential inclusions [11, 13, 24]. Here we consider the optimal control problem with Nonconvex subdifferential calculus, and in particular the “extremal principle” of Kruger-Mordukhovich that can be viewed as a nonconvex extension of the separation theorem, needsclosed sets and lsc functions. But the set of trajectories of a control system is typically not closedin the topology of uniform convergence and its closure contains all relaxed trajectories. Thereforethe proofs based on nonconvex separation give maximum principle only for relaxed systems - seee.g. [18]. The only work known for me where controllability approach is used to get second order con-ditions is [3]. But it is assumed in the paper that the optimal control takes value in the interior ofthe set of admissible controls for all t . OC ) minimize ℓ ( x (0) , x ( T )) , s . t . ˙ x = f ( t, x, u ) , u ∈ U ( t ) ,g ( t, x ( t )) ≤ , Φ( x (0) , x ( T )) ∈ S. The principal results of the paper include reduction theorems (Theorems 3.1 and3.2), the maximum principle (Theorem 4.2) and a second order condition for a strongminimum (Theorem 5.5). Recall that a feasible control process ( x ( · ) , u ( · )) is calleda strong local minimum in ( OC ) if for some ε > ℓ ( x (0) , x ( T )) ≤ ℓ ( x (0) , x ( T )) holds for all feasible pairs ( x ( · ) , u ( · )) satisfying k x ( t ) − x ( t ) k < ε forall t , no matter how far u ( t ) may be from u ( t ). In this case u ( · ) is usually calledan optimal control and x ( t ) an optimal trajectory in the problem. It should be saidhowever that in proofs we shall be actually dealing with a weaker type of minimumclose to what is called Pontryagin minimum in [17].The assumptions on the components of the problem differ of course for the firstand second order conditions and will be stated in the corresponding parts of thepaper. Here we just mention that x and u are elements of some Euclidean spaces,say IR n and IR m . Recall also that a pair ( x ( t ) , u ( t )), with u ( t ) ∈ U ( t ), satisfyingthe differential equation is called control process with u ( t ) being a control function (or just control ) and x ( t ) the corresponding trajectory . Control functions will bealways assumed uniformly bounded (that is belonging to L ∞ ) which of course doesnot affect generality of presentation too much. A control process is feasible if x ( · )satisfies the end point and state constraints.Reduction theorems to be used here are applied to certain subproblems of ( OC )obtained by replacement of U ( t ) by smaller and better structured sets. The uncon-strained problems that appear as a result of the reduction theorems resemble theclassical Bolza problem with functionals to be minimized looking approximately asfollows ϕ ( x ( · )) + Z T k ˙ x − ψ ( t, x, u ) − k X i =1 α i ψ i ( t, x, u ) k dt, where α i are nonnegative numbers and u ( t ) are taken from the mentioned betterstructured subset of U ( t ). The off-integral term ϕ ( · ) in general is a Lipschitz functionon the space of continuous functions. But in the absence of state constraints this isa function of the end points ( x (0) , x ( T )).Efficiency of the unconstrained reduction technique is then demonstrated by aproof of the maximum principle for ( OC ) in Section 4 (Theorem 4.2). Formally, thetheorem is equivalent to the the maximum principle proved in [24]. So it could beobtained from the maximum principle for problems with differential inclusions (asit was done in [24] and earlier in [11] for problems without state constraints). Buta direct proof based on the reduction theorems is substantially simpler.3he final main result, Theorem 5.5 proved in the last section, gives a new secondorder necessary optimality condition for a strong minimum. In subsection 4.3 wediscuss in sufficient details the connection of our theorem with two earlier secondorder conditions for a strong minimum: a very recent result of Frankowska andOsmolovskii [8] and an earlier result of Pales and Zeidan [20]. In particular we shallsee that (up to some difference in assumptions with [20]) the condition provided byTheorem 5.5 is strictly stronger. (Note also that here, as in both papers mentionedabove, the optimal control is assumed only measurable, not piecewise continuous asin many other publications dealing with second order conditions. We refer to [8] alsofor a brief account of the literature on second order conditions in optimal control.)But before we turn to optimal control problems, we introduce in the next sec-tion the necessary information from variational analysis including the penalizationprinciples, central for the unconstrained reduction, and the list of the subdifferentialcalculus rules used at the last step of the proof of Theorem 4.2. Notation
For x ∈ IR n we denote by k x k the standard Euclidean norm. C ([0 , T ])is the space of continuous functions with the standard norm k x ( · ) k C equal to max-imum of k x ( t ) k on [0 , T ]. We use the same notation for the space of real-valuedfunctions and for IR n -valued functions and do the same for all other functionalspaces introduced below. We hope there will be no confusion caused and each timethe specifics of the space will be clear from the context. W ,p ([0 , T ], 1 ≤ p ≤ ∞ , is the Banach space of all absolutely continuous map-pings defined on [0 , T ] with the norm k x ( · ) k ,p = k x (0) k + k ˙ x ( · ) k p , where k · k p standsfor the L p -norm with 1 ≤ p ≤ ∞ . In what follows we shall use a simpler notationfor the space and write just W ,p .By h· , ·i we denote the inner product in IR n and the canonical bilinear form inBanach spaces. Again, we hope this will not be a cause of any confusion. We shallalso use the notation y ∗ ◦ F for the composition of a mapping into Banach space Y and the action of the functional y ∗ . Finally, by B ( x, r ) we denote the ball of radius r around x . The unit ball aroud the origin will be denoted simply by B . Below we offer some information needed for the subsequent discussions. Details canbe found in [12].1.
Three basic principles . We start with the three basic principles. The firstis the well known Ekeland principle which is, by far, one of the most fundamentalfacts of the variational analysis, in particular the key element in proofs of manyexistence theorems. Its statement and proofs can be found in many books, see e.g.[2, 12, 24]. 4 roposition 2.1 (Ekeland’s principle) . Let X be a complete metric space and f a lower semicontinuous function on X bounded from below. Let further f ( x ) ≤ inf f + ε . Then for any λ > there is a u ∈ X such that d ( x, u ) ≤ λ , f ( u ) ≤ f ( x ) and the function g ( x ) = f ( x ) + ( ε/λ ) d ( x, u ) has a unique global minimum at u . The other two principles deal with exact (and necessarily nonsmooth) penaliza-tion. The first is a simple observation made in 1976 by Clarke (see e.g. [6] for theproof).
Proposition 2.2.
Let X be a metric space and ϕ ( x ) a function on X which isLipschitz in a neighborhood of a certain x . Let further M ⊂ X contain x . If ϕ attains at x a local minimum on M , then for any K > greater than the Lipschitzconstant of ϕ , the function ϕ ( x )+ Kd ( x, M ) attains an unconditional local minimumat x . Estimating the distance to the constraint set in an optimization problem maybe difficult when the set is defined by functional relations. Not surprisingly, Clarkewho effectively used this penalization result to deal with nonsmooth nonlinear pro-gramming problems did not apply it for optimal control and developed, instead, atotally different techniques. Later Loewen in [16] did apply the proposition to getmaximum principle for a free end point optimal control problem with differentialinclusions but, again, had to use Clarke type techniques for a general problem.The idea of the following closely connected and more flexible result goes back tosome 1979 papers by the author. Its proof easily follows from Ekeland’s principleand Proposition 2.2 (see e.g. [12]).
Proposition 2.3 (optimality alternative) . Let X be a complete metric space, let f be a locally Lipschitz function on X and let M ⊂ X . Consider the problem ofminimiing f on M , and let x be a local minimum in the problem. Let finally ϕ bea nonnegative lsc function on X equal to zero at x . Then the following alternativeholds: • either there is a λ > such that λf + ϕ has an unconditinal local minimumat x (non-singular case); • or there is a sequence of x m cl M converging to x and a such that for each m the function ϕ ( x ) + m − d ( x, x m ) attains a global minimum at X (singular case). In what follows we refer to ϕ as test function . The possibility to choose differenttest functions adds a lot of flexibility. The price to pay is the necessity to consider asequence of problems in singular cases but the gain is a substantial extension of theclass of problems to be dealt with. For optimal control problems for systems gov-erned by differential inclusions this idea was instrumental in getting the maximumprinciple without convexity assumptions on sets of possible velocities (see [11]).5. Metric regularity . This is one of the central concepts of variational analysis.Here we just mention a few facts needed for further discussions. Let X and Y bemetric spaces and F : X ⇒ Y a set-valued mapping. We use the same symbol d ( · , · )to denote the distance in either space. It will be always clear which space we aretalking about. Take an (¯ x, ¯ y ) ∈ Graph F . It is said that F is (metrically) regular near (¯ x, ¯ y ) if there are K > ε > d ( x, F − ( y )) ≤ Kd ( y, F ( x ))if d ( x, x ) < ε and d ( y, y ) < ε . It is said that F is subregular at (¯ x, ¯ y ) if the inequalityholds with y = y and x ∈ B ( x, ε ). The following is the main (for this paper) exampleof a regular mapping. Proposition 2.4.
Let S be the set of solutions of the differential equation ˙ x = F ( t, x ) defined on [0 , T ] . Assume that we are given an x ( · ) ∈ W , ( · ) , an ε > anda summable k ( t ) such that k F ( t, x ) − F ( t, x ′ ) k ≤ k ( t ) k x − x ′ k , if x, x ′ ∈ B ( x ( t ) , ε ) , a . e . If (cid:16) Z T k ( t ) dt (cid:17) Z T k ˙ x ( t ) − F ( t, x ( t )) k dt < ε, then the distance from x ( · ) to S in W , does not exceed K Z T k ˙ x ( t ) − F ( t, x ( t )) k dt ,where K depends only on ε and k ( · ) . The theorem, probably absent in the literature as stated is an easy consequenceof [12], Theorem 7.33. Similar results with different estimates (and proofs) followfrom some earlier publications (e.g. [6], Theorem 3.1.6, , [16], Theorem 2C.5).3.
Subdifferentials . There are several types of subdifferentials used in localvariational analysis. We shall basically work with the G -subdifferential (coincidingwith the limiting Fr`echet subdifferential in finite dimensional spaces) and Clarke’sgeneralized gradient. In what follows the symbol ∂ will be used for the first and ∂ C for the second. These are the only “good” subdifferentials that make sense and workin arbitrary Banach spaces. Moreover, if X is a separable Banach space then the G -subdifferential is the minimal subdifferential with the following properties (amongothers) • if f is lsc and attains a local minimum at x , then 0 ∈ ∂f ( x ); • if f is locally Lipschitz then ∂f ( x ) = ∅ and the mapping x → ∂f ( x ) is bounded-valued and norm-to-weak ∗ usc; • if f is Lipschitz near x then conv ∂f ( x ) coincides with Clarke’s generalizedgradient ∂ C f ( x ) of f at x ; 6 if f is convex, then ∂f ( x ) coincides with the subdifferential in the sense ofconvex analysis; if f is strictly differentiable at x , then ∂f ( x ) = { f ′ ( x ) } .If S ⊂ X is a closed set and x ∈ S , then N ( S, x ) = cone ∂d ( · , S )( x ) is the normalcone to S at x .Here are some calculus rules for G -subdifferentials to be used in the paper: • ∂ ( λ∂f ( x )) = λf ( x ) ( λ > • If f = f + f , where both function are lsc and at least one of them is Lipschitznear x , then ∂f ( x ) ⊂ ∂f ( x ) + ∂f ( x ). • If f ( x, y ) = f ( x ) · f ( y ) and both f and f are nonnegative and Lipschitznear x and y respectively then, ∂f (¯ x, ¯ y ) = f ( x )( { } × ∂f ( y )) + f ( y )( ∂f ( x ) × { } ). • If X is a closed subspace of L ∞ ([0 , T ] , IR n ), f ( t, x ) is measurable in t and k ( t )-Lipschitz in x in the ε -neighborhood of x ( t ) a.e. (with summable k ( · )) and f ( x ( · )) = R T f ( t, x ( t )) dt , then ∂f ( x ( · )) ⊂ R T ∂ C f ( t, x ( t )) dt in the sense that for any x ∗ ∈ ∂f ( x ( · )) there is a summable ξ ( t ) ∈ ∂ C f ( t, x ( t )) a.e. such that for all h ( · ) ∈ X h x ∗ , h ( · ) i = Z T h ξ ( t ) , h ( t ) i dt. • If f = g ◦ F , F : IR n → IR m and both g and F are Lipschitz near y = F ( x ),and x respectively, then ∂f ( x ) ⊂ [ y ∗ ∈ ∂g ( y ) ( y ∗ ◦ F )( x ) . • If f ( x ) = max i f i ( x ), i = 1 , . . . , k , and all f i are Lipschitz near x , then ∂f ( x ) ⊂ (cid:8) X i ∈ I ( x ) α i x ∗ i : x ∗ i ∈ ∂f i ( x ) , α i ≥ , X α i = 1 (cid:9) , where I ( x ) = { i : f i ( x ) = f ( x ) } . • If X = C [0 , T ] and f ( x ( · )) = max t ∈ [0 ,T ] g ( t, x ( t )), where g is an usc real-valuedfunction satisfying | g ( t, x ) − g ( t, x ′ ) | ≤ k ( t ) k x − x ′ k if k x − x ( t ) k < ε with summable k ( · ) and ε >
0, then ∂f ( x ( · )) consists of measures ν such that ν ( dt ) = γ ( t ) µ ( dt ),where µ is a probability measure supported on ∆ = { t : g ( t, x ( t )) = f ( x ( · )) } and γ ( · ) is a measurable selection of the set-valued mapping t → ¯ ∂g ( t, x ( · ))( x ( t )), where¯ ∂g ( t, x ) = { lim y m : y m ∈ ∂g ( t m , · )( x m ) , t m → t, x m → x, g ( t m , x m ) → g ( t, x ) } . We refer to [6, 12, 21, 23] for further details.3.
Tangent cones . Let again X be a Banach space, Q ⊂ X is closed, x ∈ Q .The (classical) tangent cone T ( Q, x ) to Q at x is the collection of all h ∈ X suchthat d ( x + th, Q ) = o ( t ). 7f h ∈ T ( Q, x ) then the collection T ( Q, x ; h ) of v ∈ X such that d ( x + th + t v, Q ) = o ( t ) is the second order tangent set to Q at x along h . We denote by T ( Q, x ) the collection of h ∈ T ( Q, x ) for which T ( Q, x ; h ) = ∅ .4. Measurability . Recall that a set Q ⊂ [0 , T ] × IR n is L × B -measurable ifit belongs to the σ -algebra generated by all products ∆ × V , where ∆ ⊂ [0 , T ] isLebesgue measurable and V ⊂ IR n is open. A set-valued mapping V ( t ) from [0 , T ]into IR n is L × B -measurable (or just measurable ) if its graph is
L × B -measurable. Asingle-valued measurable mapping v ( t ) from [0 , T ] into IR n is a measurable selection of V ( · ) if v ( t ) ∈ V ( t ) for almost every t . For us the most important properties ofmeasurable set-valued mappings are: • if V ( t ) is measurable, then cl V ( t ) (where clV is the closure of V ) is alsomeasurable; • a measurable set-valued mapping has a measurable selection (Aumann selectiontheorem).Specifically, we shall need the following consequence of the last property. Proposition 2.5.
Let U ( t ) be an L × B -measurable set-valued mapping from [0 , T ] into IR m and f ( t, u ) an L×B -measurable extended-real valued function on [0 , t ] × IR m .Assume further that a measurable selection u ( · ) of U ( · ) be given such that f ( t, u ( t )) is a summable function and Z T f ( t, u ( t )) dt ≥ Z T f ( t, u ( t )) dt for any measurable selection u ( · ) of U ( · ) for which the integral in the right side of theinequality makes sense. Then for almost every t the inequality f ( t, u ( t )) ≥ f ( t, u ) holds for all u ∈ U ( t ) .Proof. Assume the contrary, and let V ( t ) = { u ∈ U ( t ) : f ( t, u ) > f ( t, u ( t )) } . Thenthe graph of V ( · ) is ℓ × B -measurable and V ( t ) = ∅ on a set ∆ of positive measure.Let v ( t ) be a measurable selection of V ( · ) defined on ∆, and let u ( t ) coincides with v ( t ) on ∆ and with u ( t ) on [0 , T ] \ ∆. Then f ( t, u ( t )) ≥ f ( t, u ( t )) for all t , hence theintegral of f ( t, u ( t )) makes sense and we come to a contradiction.All other results relating to measurable set-valued mappings to be used in the paperare immediate consequences of the definitions. We refer to [4, 24] for more details. Let ( x ( · ) , u ( · )) be a strong local minimum in ( OC ). This means that there is a ε > ℓ ( x (0) , x ( T )) ≤ ℓ ( x (0) , x ( T )) for any feasible ( x ( · ) , u ( · )) such that k x ( t ) − x ( t ) k < ε for all t . The following assumptions on components of ( OC ) will8e adopted throughout the paper (and either strengthened or supplemented withadditional assumptions whenever necessary):(H ) ℓ is a real-valued function, Lipschitz in a neighborhood of ( x (0) , x ( T ));(H ) there are ε > δ ≥ k ( t ) such that for almost every t the relations k f ( t, x ( t ) , u ) k ≤ k ( t ); k f ( t, x, u ) − f ( t, x ′ , u ) k ≤ k ( t ) k x − x ′ k a . e . hold for x, x ′ ∈ B ( x ( t ) , ε ), u ∈ U ( t ) = B ( u ( t ) , δ ) ∩ U ( t );(H ) the set-valued mapping t → { ( u, f ( t, x ( t ) , u )) : u ∈ U ( t ) } is L × B -measurable;(H ) g ( t, x ) is upper semicontinuous in both variables and there is a ρ > g ( t, · ) is ρ -Lipschitz on B ( x ( t ) , ε ) for every t ∈ [0 , T ];(H ) S ⊂ IR r is a closed set, Φ : IR n × IR n → IR r is continuously differentiablenear ( x (0) , x ( T )) and Φ ′ ( x (0) , x ( T )) is a linear operator onto IR r . .(We do not exclude the possibility that δ = 0. In fact, this is exactly the case weshall be dealing with in the proof of the maximum principle in the next section.)The reduction theorems we are going to state and prove in this section actuallyapply not to ( OC ) but to its subproblems defined as follows. Denote by U thecollection of all mesurable selections u ( · ) of U ( · ) for which there are γ > k ( t ) such that for almost every t the inequalities k f ( t, x ( t ) , u ( t )) k ≤ k ( t ) , k f ( t, x, u ( t )) − f ( t, x ′ , u ( t )) k ≤ k ( t ) k x − x ′ k hold for all x, x ′ ∈ B ( x ( t ) , γ ). By (H ) u ( · ) ∈ U . We shall always assume that γ ≤ ε and k ( t ) ≥ k ( t ) a.e.. It will be also convenient to occasionally denote the γ and k ( · ) associated with a given u ( · ) ∈ U by γ u and k u ( · ).Take now a finite collection { u ( · ) , . . . , u k ( · ) } of elements of U , and let U k ( t ) = U ( t ) ∪ { u ( t ) , . . . , u k ( t ) } (with U ( · ) from (H )). The subproblem we shall work with is( OC k ) minimize ℓ ( x (0) , x ( T )) , s . t . ˙ x = f ( t, x, u ) , u ∈ U k ( t ) ,g ( t, x ( t )) ≤ , Φ( x (0) , x ( T )) ∈ S Clearly ( x ( · ) , u ( · )) is a strong local minimum in ( OC k ).In what follows we denote by U k the collection of measurable u ( · ) such that u ( t ) ∈ U k ( t ) a.e.. It is clear that U k ⊂ U . Let further X k denote the collection ofall pairs ( x ( · ) , u ( · )) with k x ( · ) − x ( · ) k C ≤ ε and u ( · ) ∈ U k satisfying the equation˙ x = f ( t, x, u ). Thus ( OC k ) is the problem of minimizing ℓ ( x (0) , x ( T )) on the set9 k of elements of X k satisfying Φ( x (0) , x ( T )) ∈ S . We shall endow X k with the C ([0 , T ]) × L -metric. As U k ( t ) are closed sets bounded by summable functions, X k is a complete metric space.We shall also consider the space Z k of ( k + 2)-tuples z = ( x ( · ) , u ( · ) , α , . . . , α k )with x ( · ) ∈ W , , u ( · ) ∈ U k , α i ≥ P α i ≤
1. Unlike X k , the u ( · )-componentsof elements of Z k will be considered with the L ∞ -topology of uniform convergencealmost everywhere. Set ψ ( x ( · )) = max { ℓ ( x (0) , x ( T )) − ℓ ( x (0) , x ( T )) , max ≤ t ≤ T g ( t, x ( t )) } ;and J k ( z ) = J k ( x ( · ) , u ( · ) , α , . . . , α k )= Z T k ˙ x ( t ) − f ( t, x ( t ) , u ( t )) − k X i =1 α i ( f ( t, x ( t ) , u i ( t )) − f ( t, x ( t ) , u ( t ))) k dt. We are ready to state and prove the first reduction theorem.
Theorem 3.1.
We posit (H )-(H ). If ( x ( · ) , u ( · )) is a strong local minimum in( OC k ), then the following alternative holds with some sufficiently big K > :– either there is a λ > such that the functional J k ( z ) = λψ ( x ( · )) + d (Φ( x (0) , x ( T )) , S ) + J k ( x ( · ) , u ( · ) , α , . . . , α k ) attains a local minimum on Z k at z = ( x ( · ) , u ( · ) , , . . . , ;–or there is a K > and a sequence ( x m ( · ) , u m ( · )) ⊂ X k \ M k , m = 1 , , . . . converging to ( x ( · ) , u ( · )) and such that for any (sufficiently large) m the functional J km ( z ) = d (Φ( x (0) , x ( T )) , S ) + KJ k ( x ( · ) , u ( · ) , α , . . . , α k )+ m − (cid:16) k x ( · ) − x ( · ) k C + Z T (cid:0) k u ( t ) − u m ( t ) k + X α i k u i ( t ) − u m ( t ) k (cid:1) dt (cid:17) attains a local minimum on Z k at z m = ( x m ( · ) , u m ( · ) , , . . . , .Proof.
1. We start with an almost obvious remark that ( OC k ) can be equivalentlyreformulated as the problem of minimuzing ψ on X k subject to Φ( x ( · ) , u ( · )) ∈ S :minimize ψ ( x ( · )) , s . t . ˙ x = f ( t, x, u ) , u ∈ U k ( t ) , Φ( x (0) , x ( T )) ∈ S. We shall continue to refer to this last formulation of the problem as ( OC k ). We canalso assume without loss of generality that ℓ ( x (0) , x ( T ) = 0.Next we apply the optimality alternative to the problem. (Note that ψ ( · ) isLipschitz in a neighborhood of x ( · ).) Then either there is a λ > x ( · ) , u ( · )) is a local minimum of I ( x ( · )) = λ ψ ( x ( · )) + d (Φ( x (0) , x ( T )) , S )10n X k (that is subject to ˙ x = f ( t, x, u ) , u ∈ U k ( t )) ( non-singular case ) or there is asequence of pairs ( x m ( · ) , u m ( · )) ∈ X k \ M k converging to ( x ( · ) , u ( · )), and such that I m ( x ( · ) , u ( · )) = d (Φ( x (0) , x ( T )) , S )+ m − (cid:16) k x ( · ) − x m ( · ) k C + Z T k u ( t ) − u m ( t ) k dt (cid:17) ≥ d (Φ( x m (0) , x m ( T )) , S ) > X k at ( x m ( · ) , u m ( · )) ( singular case ).2. Fix a u ( · ) ∈ U k and α i ≥ P α i <
1, and set γ = min { γ u , . . . , γ u k } and k ( t ) = max { k u ( t ) , . . . , k u k ( t ) } . Let further x ( · ) be a solution of the followingdifferential equation:˙ x = f ( t, x, u ( t )) + k X i =1 α i ( f ( t, x, u i ( t )) − f ( t, x, u ( t ))) . (1)If x ( · ) is sufficiently close to x ( · ), say k x ( · ) − x ( ·k C < γ/
2, then there is a sequence( x s ( · ) , u s ( · )) satisfying the original equation ˙ x = f ( t, x, u ) with x s ( · ) uniformly con-verging to x ( · ) and u s ( t ) ∈ { u ( t ) , u ( t ) , . . . , u k ( t ) } almost everywhere.This is immediate from a number of relaxation theorems (see e.g [16], Theorem2F.2 or [24], Theorem 2.7.2). The absence of end point constraints makes construc-tion of the desired sequence especially easy. All we need is to break [0 , T ] into s equalintervals ∆ j , then to choose in each of them k disjoint subsets ∆ ij with measuresrespectively equal to α i T /s and set u s ( t ) = (cid:0) − k X i =1 α is ( t ) (cid:1) u ( t ) + k X i =1 α is u i ( t ) , where α is ( t ) = (cid:26) , if t ∈ ∪ j ∆ ij ;0 , otherwise . Then for any i = 1 , . . . , k the sequence ( α is ( · )) weakly (e.g. in L ) converges to thefunction identically equal to α i and it is not a difficult matter to deduce (taking (H )into account) that the sequence ( x s ( · )) of solutions of equations ˙ x = f ( t, x, u s ( t ))with x s (0) = x (0) uniformly converges to x ( · ).As I m ( x s ( · ) , u s ( · )) ≥ I m ( x m ( · , u m ( · ))), it follows that in the non-singular case( x ( · ) , u ( · ) , , . . . ,
0) is a local minimum in the problem of minimizing I on the set of z = ( x ( · ) , u ( · ) , α , . . . , α k ) ∈ Z k satisfying (1). For the same reason in the singularcase the obvious inequality k u s ( t ) − u m ( t ) k ≤ (1 − m X i =1 α is ( t )) k u ( t ) − u m ( t ) k + m X i =1 α is ( t ) k u i ( t ) − u m ( t ) k x m ( · ) , u ( · ) , , . . . ,
0) is a global minimum in the problemof minimizing the functional d (Φ( x (0) , x ( T )) , S ) + m − (cid:16) k x ( · ) − x m ( · ) k C + Z T (cid:0) k u ( t ) − u m ( t ) k + X α i k u i ( t ) − u m ( t ) k (cid:1) dt (cid:17) on the same set of z = ( x ( · ) , u ( · ) , α , . . . , α k ) ∈ Z k satisfying (1). If α = . . . = α k =0 this quantity coincides with I m ( x ( · ) , u ( · )), so it can be viewed as an extension of I m to Z k and we can continue to keep the notation M m for this functional.3. Given a ( u ( · ) , α , . . . , α k ) ∈ U k × IR k + with P α i <
1, we denote by Q ( u ( · ) , α , . . . , α k ) ⊂ W , the set of solutions of (1). As follows from Proposition2.4 (in view of (H )), there are K > γ > d ( x ( · ) , Q ( u ( · ) , α , . . . , α k )) ≤ K J k ( x ( · ) , u ( · ) , α , . . . , α k )holds for ( x ( · ) , u ( · ) , α , . . . , α k ) ∈ Z k satisfying k x ( · ) − x ( · ) k C + (cid:0) Z T k ( t ) dt (cid:1) J k ( x ( · ) , u ( · ) , α , . . . , α k ) < γ. It remains to apply Proposition 2.2 to the result obtained at the end of theprevious step of the proof. In the non-singular case it follows that ( x ( · ) , u ( · ) , , . . . , I ( x ( · ) , u ( · )) + KJ k ( x ( · ) , u ( · ) , α , . . . , α k ) on Z k for some K >
0. This implies the “either” part of the statement with λ = λ if K ≤ λ = λ /K if K > x m ( · ) , u m ( · ) , , . . . ,
0) is a localminimum of I m ( x ( · ) , u ( · ) , α , . . . , α k ) and the “or” part of the statement follows aswell.The conclusion of the theorem in the non-singular case may be not fully satis-factory in certain cases for the following reason. The necessary optimality conditionfor J k obtained by application of the subdifferential calculus to J k will inevitablyinclude the limiting or Clarke’s subdifferential of ψ at x ( · ). But this subdifferen-tials contain vectors obtained from the analysis of behavior of the function at points(close to x ( · )) at which the value of ψ is strictly smaller than ψ ( x ( · )). If such pointsdo exist, they lie are outside of the feasible domain of the problem, so taking theminto account may only decrease precision of the necessary condition.To avoid such a possibility, we shall slightly modify the functional obtained inthe non-singular case at the first part of the proof of the theorem and consider thesequence of problems of minimizing λ ψ m ( x ( · )) + d (Φ( x (0) , x ( T )) , S ) on X k , where ψ m ( x ( · )) = max { ℓ ( x (0) , x ( T )) + m − , max ≤ λ ≤ T g ( t, x ( t )) } x ( · ) may no longer be a local minimum in this problem. But the value ofthe new functional at x ( · ) can exceed the minimal value in the problem at most by m − .So by Ekeland’s principle, for any suficiently big m there is a pair ( x m ( · ) , u m ( · )) ∈ X k such that k x m ( · ) − x ( · ) k C < m − , k u ( · ) − u m ( · ) k L ≤ m − and the function λ ψ m ( x ( · )) + d (Φ( x (0) , x ( T )) , S ) + m − ( k x ( · ) − x m ( · ) k C + Z T k u ( t ) − u m ( t ) k dt )attains a local minimum on X k at ( x m ( · ) , u m ( · )).Thus we arrive to a series of problems that are very similar to what we have inthe singular cases. The subsequent arguments in the proof of the theorem can beapplied to these problems as well, and we arrive at the following conclusion. Theorem 3.2.
We posit (H )-(H ). If ( x ( · ) , u ( · )) is a strong local minimum in( OC ), then there are λ ≥ , K > and a sequence ( x m ( · ) , u m ( · )) ⊂ X k , m = 1 , , . . . converging to ( x ( · ) , u ( · )) and such that the functional λψ m ( x ( · ) + d (Φ( x (0) , x ( T )) , S ) + KJ k ( x ( · ) , u ( · ) , α , . . . , α k )+ m − (cid:16) k x ( · ) − x m ( · ) k C + Z T (cid:0) k u ( t ) − u m ( t ) k dt + X α i k u i ( t ) − u m ( t ) k (cid:1) dt (cid:17) attains a local minimum on Z k at z m = ( x m ( · ) , u m ( · ) , , . . . , and either λ > or λ = 0 and Φ( x m ( · ) , u m ( · )) S . By that we mean the problem of minimization the functional J ( x ( · ) , α , . . . , α k ) = ϕ ( x ( · )) + Z T k ˙ x ( t ) − ψ ( t, x ( t )) − k X i =1 α i ψ i ( t, x ( t )) k dt on W , × IR k + , where ϕ is a Lipschitz function on C [0 , T ] and ψ i : [0 , T ] × IR n → IR n are Carath`eodory functions satisfying k ψ i ( t, x ) k ≤ R ( t ) , k ψ i ( t, x ) − ψ i ( t, x ′ ) k ≤ R ( t ) k x − x ′ k , if x, x ′ ∈ B ( x ( t ) , ε )with some ε > R ( t ).If ϕ is a function of the end points ( x (0) , x ( T )), this is a very specific case ofthe well studied “generalized Bolza problem” (see e.g. [6, 14, 16]). The presenceof nonnegative parameters α i does not add much difficulty to analysis. A case of a13eneral ϕ was recently considered in [13], and the first order necessary condition fora minimum of J proved in this section is immediate from the main result of [13].But we prefer to give a separate proof here which, as we have already mentioned, isnoticeably simpler.So let ( x ( · ) , (0 , . . . , J in W , × IR k + . By the assumption k ψ i ( t, x ) k ≤ R ( t ) for all x ∈ x ( t ) + εB a.e. on [0 , T ]. It follows (from Proposition2.2) that for a sufficiently large K > I ( x ( · ) , α , . . . , α k ) = J ( x ( · ) , α , . . . , α k ) + K k X i =1 α − i attains a local minimum on W , × IR k at ( x ( · ) , , . . . , α − = max { , − α } .Thus 0 ∈ ∂I ( x ( · ) , . . . , G -subdifferential, we find a measure ν ∈ ∂ϕ ( x ( · )), measurable IR n -valued functions p ( t ) satisfying k p ( t ) k ≤ q ( t ) ∈ ∂ C h p ( t ) , ψ ( t, · ) i ( x ( t )) , a . e . such that Z T (cid:0) h ( t ) ν ( dt ) + ( h ˙ h ( t ) , p ( t ) i − h h ( t ) , q ( t ) i ) dt (cid:1) = 0 , (2)for all h ( · ) ∈ W , , and 0 ∈ R T h p ( t ) , ψ i ( t, x ( t ) i dt ) + [ − K, Z T h p ( t ) , ψ i ( t, x ( t )) i dt ≤ , , i = 1 , . . . , k. (3)Setting h ( t ) = h (0) + R t ˙ h ( s ) ds , we get Z T h ( t ) ν ( dt ) = h h (0) , ν ( { } ) i + h h ( T ) , ν ( { T } ) i + Z T h ( t )˜ ν ( dt )= (cid:10) h (0) , ν ( { } ) + Z T ˜ ν ( dt ) (cid:11) + h h ( T ) , ν ( { T } ) i + Z T (cid:10) ˙ h ( t ) , Z Tt ˜ ν ( ds ) (cid:11) dt, (where ˜ ν ( { } ) = 0, ˜ ν ( { T } ) = 0 and ˜ ν (∆) = ν (∆) for ∆ ⊂ (0 , T )), and Z T h h ( t ) , q ( t ) i dt = (cid:10) h (0) , Z T q ( t ) (cid:11) dt + Z T (cid:10) ˙ h ( t ) , Z Tt q ( s ) (cid:11) ds. Thus we can rewrite (2) as follows (cid:10) h (0) , ν ( { } ) + Z T ˜ ν ( dt ) (cid:11) + h h ( T ) , ν ( { T } ) i − (cid:10) h (0) , Z T q ( t ) (cid:11) dt + Z T (cid:10) ˙ h ( t ) , p ( t ) + Z Tt ˜ ν ( ds ) − Z Tt q ( s ) ds (cid:11) dt = 0 (4)14pplying the equality for h ( · ) equal zero at the ends of the interval, we deduce that p ( t ) + Z Tt ˜ ν ( ds ) − Z Tt q ( s ) ds = const = c a.e. Changing p ( · ) on a set of measure zero, if necessary, we get from here that p ( · ) is afunction of bounded variation continuous from the left. If we now apply (4) for h ( · )equal to zero at zero, we conclude that c = − ν ( { T } ), that is p ( t ) + Z Tt ν ( ds ) − Z Tt q ( s ) ds = 0 , ∀ t > . (5)and (4) eventually implies that Z T ν ( dt ) = Z T q ( t ) dt. (6)Defining p (0) by continuity, we can conclude with the following statement.. Proposition 4.1. If ( x (0) , , . . . , is a local minimum of J on W , × IR k + , thenthere are a measure ν ∈ ∂ϕ ( x ( · )) , a function p ( · ) of bounded variation continuousfrom the left and a summable q ( · ) taking values in ∂ C h p ( t ) , ψ ( t, · ) i ( x ( t )) a.e. suchthat p (0) = ν ( { } ) and the relations (3), (5) and (6) hold true. Let us return to ( OC ). To make the reduction theorem applicable, we have to besure that the collection U of “good” controls is sufficiently rich. To this end we addthe following assumption to (H )-(H ):(H ) for almost every t the mapping f ( t, · , u ) is Lipschitz near x ( t ) for every u ∈ U ( t ), that is there are ρ > ε > t and u ) such that k f ( t, x, u ) − f ( t, x ′ , u ) k ≤ ρ k x − x ′ k , ∀ x, x ′ ∈ B ( x ( t ) , δ ) . This is the weakest Lipschitz-type assumption of f ( t, · , u ), certainly sufficient forapplications. Note however that there are proofs of the maximum principle with f ( t, · , u ) Lipschitz at x ( t ) only for u = u ( t ) (see e.g [1]), although at the expense ofsome other assumptions. We for instance do not need continuity of f ( t, x, · ). It isnot clear whether these two weakenings can be combined.To state the maximum principle for ( OC ) we recall the standard notation: fora vector p ∈ IR n let H ( t, x, p, u ) = h p, f ( t, x, u ) i , H ( t, x, p ) = sup u ∈ U ( t ) H ( t, x, p, u ) . We also set ∆ = { t : g ( x ( t )) = 0 } and ∂ >C g ( t, x ) = conv { lim y m : y m ∈ ∂g ( t m , · )( x m ) , t m → t, x m → x, g ( t m , x m ) > g ( t, x ) } . heorem 4.2. Assume (H )-(H ). If ( x ( · ) , u ( · ) is a strong local minimum in ( OC ),then there are λ ∈ [0 , , a function of bounded variation p ( t ) on [0 , T ] , continuousfrom the left, a regular nonnegative measure µ with µ ([0 , T ]) ≤ supported on ∆ ,a summable q ( t ) ∈ ∂ C H ( t, · , p ( t ) , u ( t ))( x ( t )) a.e., a µ -measurable selection γ ( t ) ofthe set-valued mapping t → ∂ >C g ( t, x ( t )) and a pair ( w , w T ) ∈ λ∂ℓ ( x (0) , x ( T )) +Φ ′∗ ( x (0) , x ( T )) (cid:0) N ( S, Φ( x (0) , x ( T ))) (cid:1) such that the following relations are satisfied λ + k p ( · ) k + µ ([0 .T ]) > (nontriviality); p (0) = w , p ( T ) + γ ( T ) µ ( { T } ) = − w T (transversality); p ( t ) = − w T + Z Tt q ( s ) ds − Z Tt γ ( s ) dµ ( s ) , ∀ t (adjoint equation); H ( t, x ( t ) , p ( t ) , u ( t )) = H ( t, x ( t ) , p ( t )) a.e. (maximim principle). Here k p ( · ) k = sup t k p ( t ) k . Remark 4.3.
In the classical smooth setting the statement of the theorem reducesto the maximum principle for optimal control problems proved in [15]. It differshowever from the statement of the maximum principle in [24], Theorem 9.3.1 (provedunder basically the same assumptions as here). Nonetheless, both statements areequivalent. To see this, let us denote by q ( · ) what is p ( · ) in our theorem andby η ( t ) what is q ( t ), that is q ( t ) = − w T + R Tt η ( s ) ds − R Tt γ ( s ) dµ ( s ), set further p ( t ) = w − R t η ( s ) ds and q ( t ) = p ( t ) + R t γ ( s ) µ ( ds ). Then q ( t ) − q ( t ) = − ( w + w T ) + Z T η ( T ) dt − Z T γ ( T ) µ ( dt ) = 0for all t < T . Since q is continuous from the left, we get from here that lim t → T q ( t ) = q ( T ) and therefore q ( T ) = q ( T ) + γ ( T ) µ ( { T } ) = − w T . Verification that Theorem3.3 and Theorem 9.3.1 in [24] are equivalent is now straightforward .
1. The key element of the proof is the study of necessary conditions in ( OC k ) with U ( t ) ≡ { u ( t ) } , so that U k ( t ) = { u ( t ) , u ( t ) , . . . , u k ( t ) } . We claim that the theoremis valid if the following weaker integral maximum principle holds for every such( OC k ). Theorem 4.4.
Assume (H )-(H ). If ( x ( · ) , u ( · )) is a strong local minimum in( OC k ) with U k ( t ) = { u ( t ) , u ( t ) , . . . , u k ( t ) } , u i ( · ) ∈ U , then the conclusion of Theo-rem 4.2 holds with the maximum principle replaced by Z T H ( t, x ( t ) , p ( t ) , u ( t )) dt ≥ Z T H ( t, x ( t ) , p ( t ) , u i ( t )) dt, i = 1 , . . . , k.
16o assume that Theorem 4.4 is true. Given an ( OC k ), let Λ k be the set oftriples ( λ, p ( · ) , µ ) satisfying the conditions of Theorem 4.4 along with normalizednontriviality condition: λ + k p ( · ) k + µ ([0 , T ]) = 1. As all four relations in thetheorem are positively homogeneous with respect to λ, p ( · ) , µ , this set is nonemptyby the theorem. It is clear that Λ k ′ ⊂ Λ k if U k ⊂ U k ′ . If the intersection Λ of all Λ k is nonempty, then for any ( λ, p ( · ) , µ ) ∈ Λ Z T H ( t, x ( t ) , p ( t ) , u ( t )) dt ≥ Z T H ( t, x ( t ) , p ( t ) , u ( t )) dt, ∀ u ( · ) ∈ U . (7)In view of Proposition 2.5, to prove the maximum principle we only need to verifythat (7) actually holds for all measurable selections of U ( · ) for which the integral onthe write side of (7) makes sense. Let v ( · ) be such a selection of U ( · ). If the inequalityopposite to (7) holds for this v ( · ), then the set ∆ = { t : H ( t, x ( t ) , p ( t ) , v ( t )) >H ( t, x ( t ) , p ( t ) , u ( t )) } has positive measure. Consider on ∆ the function η ε ( t ) = sup x,x ′ ∈ B ( x ( t ) ,ε ) , x = x ′ k f ( t, x, p ( t ) , v ( t )) − f ( t, x ′ , p ( t ) , v ( t )) kk x − x ′ k , which is measurable by (H ). By (H ) lim ε → η δ ( t ) < ∞ for almost every t ∈ ∆. Itfollows that there is a positive ε > ′ ⊂ ∆ of positive measure onwhich η ε is summable. Then the u ( · ) coinciding with v ( t ) on ∆ ′ and with u ( t ) onthe complement of ∆ ′ obviously belongs to U and (7) fails for this u ( · ).Thus to prove the claim we have to verify that Λ = ∅ . This in turn will be true ifwe verify that any Λ k is compact if measures are considered in the weak ∗ -topology.So fix a Λ k , take ( λ m , p m ( · ) , µ m ) ∈ Λ k , m = 1 , , . . . , and let q m ( · ) and γ m ( · ) bemeasurable selections of ∂ C h p m ( t ) , f ( t, · , u ( t )) i ( x ( t )) and ∂ >C g ( t, · )( x ( t )) respectivelysuch that p m ( t ) = − w T m + R Tt q m ( s ) ds − R Tt γ m ( s ) dµ m ( s ) and ( p m (0) , − w T m ) ∈ ∂ℓ ( x (0) , x ( T ) + A ( N ( S, z )), where we have set for simplicity z = Φ( x (0) , x ( T )) and A = Φ ′∗ ( x (0) , x ( T )). We may assume that λ k converge to some λ ≥ p m (0)converge to some p . Since the set of nonnegative measures with µ ([0 , T ]) ≤ ∗ topology of measures, we may also assume (taking if necessarya subsequence) that µ k weak ∗ converge to some µ ≥
0. In particular, µ m ([0 , T ]) → µ ([0 , T ]).By (H ), (H ) k q m ( t ) k ≤ k ( t ) k p ( t ) k , and k γ m ( t ) k ≤ ρ g almost everywhere.It follows that ( w T m ) is a bounded sequence, and again we can assume that itconverges to some w T . Clearly ( p , w T ) ∈ ∂ℓ ( x (0) , x ( T )) + A ( N ( S, z )). It alsofollows that the sequence of q m ( · ) is weak compact in L so that we may assumethat it weakly converges to some q ( · ) and, consequently, the integrals R Tt q m ( s ) ds converge uniformly to R Tt q ( s ) ds .Finally, as ∂ >C g ( t, · )( x ( t )) are convex and closed, there is a measurable γ ( t ) withvalues in ∂ >C g ( t, · )( x ( t )) such that the measures ν with dν m ( t ) = γ m ( t ) dµ m ( t ) weak ∗ ν defined by dν ( t ) = γ ( t ) dµ ( t ) (see [24], Proposition 9.2.1). It followsthat p m ( t ) converge to p ( t ) = − w T + R Tt q ( s ) ds − R Tt γ ( s ) µ ( ds ) at every t > R Tt γ ( s ) dµ ( s ) is continuous. Setting p (0) = p we find that the limiting objectssatisfy all required relations except maybe nontriviality. To prove nontriviality ofthe limiting objects, observe that if µ = 0, then p m ( · ) converge to p ( · ) uniformlyand hence λ + k p ( ·k = 1. This completes the proof of the claim.2. So we have to prove Theorem 4.4. We shall do this by applying Theorem 3.2with U ( t ) = { u ( t ) } . It follows from the theorem (if we replace λ by Kλ in casewhen K >
1) that there is a sequence ( x m ( · ) , u m ( · )) ∈ X k with x m ( · ) convergingto x ( · ) uniformly and u m ( · ) converging to u ( · ) in L such that ( x m ( · ) , , . . . ,
0) is alocal minimum of the functional λψ m ( x ( · ) + d (Φ( x (0) , x ( T )) , S ) + J k ( x ( · ) , u m ( · ) , α , . . . , α k )+ m − (cid:16) k x ( · ) − x m ( · ) k C + Z T (cid:0) X α i k u i ( t ) − u m ( t ) k (cid:1) dt (cid:17) on C ([0 , T ]) × IR k + for any m . The structure of the functional precisely correspondsto the basic model. To see this, it is enough to set ϕ ( x ( · )) = λψ m ( x ( · ) + d (Φ( x (0) , x ( T )) , S ) + m − k x ( · ) − x m ( · ) k C ψ ( t, x ) = f ( t, x, u m ( t )); ψ i ( t, x ) = f ( t, x, u i ( t )) − f ( t, x, u m ( t )) + m − k u i ( t ) − u m ( t ) k Thus we can apply Proposition 4.1 to get a necessary condition for ( x m ( · ) , , . . . , ϕ using the standard rules of subdifferentialcalculus collected in the second section.If ν ∈ ∂ψ m ( x ( · )), then ν = ξ ν + ξ ν , where ξ i ≥ ξ + ξ = 1, ξ ( ℓ ( x (0) , x ( T )) + m − − ψ m ( x ( · ))) = ξ (max t g ( t, x ( t )) − ψ m ( x ( · ))) = 0and ν is supported on { , T } with weights belonging to ∂ℓ ( x (0) , x ( T )), while ν ( dt ) = γ ( t ) µ ( dt ), where µ is a probability measure supported on the set of t at which g ( t, x ( t )) attains its maximum and γ ( t ) is a measurable selection of the set-valuedmapping t → ∂g ( t, · )( x ( t )).Setting λ = λξ and writing µ instead of λξ µ , we conclude that for any m wecan find λ m ≥
0, a nonnegative measure µ m supported on the set ∆( x m ( · )) = { t : g ( t, x m ( t )) = ψ m ( x m ( · )) } a measurable selection γ m ( · ) of the set-valued mapping ∂g ( t, · )( x m ( t )), a pair ( w m , w mT ) of vectors in IR n , a measurable selection q m ( · ) ofthe set-valued mapping t → ∂ C f ( t, · .u m ( t ))( x m ( t )) and a function p m ( t ) of boundedvariation such that • λ m + k µ m k = λ , λ m ( ℓ ( x m (0) , x m ( T )) + m − − ψ m ( x m ( · ))) = 0 and µ = 0 if max t g ( t, x m ( t )) < ψ m ( x m ( · )); 18 ( w m , w mT ) ∈ λ m ∂ℓ ( x m (0) , x m ( T ))+Φ ′∗ ( x m (0) , x m ( T )) (cid:0) ∂d ( · , S )(Φ( x m (0) , x m ( T ))) (cid:1) and the following three relations are satisfied up to terms of order m − : p m ( t ) + w mT + Z Tt ( γ m ( s ) µ m ( ds ) − q m ( s ) ds ) = 0 a . e . ; w m + w mT + Z T ( γ m ( t ) µ m ( dt ) − q m ( t ) dt ) = 0 , Z T h p m ( t ) , ψ i ( t, x m ( t )) i dt ≤ , i = 1 , . . . , k. It follows that p m (0) = w m + O ( m − ) and p m ( T ) = − ( w mT + γ m ( T ) µ m ( { T } )) + O ( m − ). Taking into account that h p, ψ ( t, x ) i = H ( t, x, p, u m ( t )) and h p, ψ i ( t, x ) i = H ( t, x, p, u i ( t )) − H ( t, x, p, u m ( t )), we conclude that p m ( · ) satisfies the following sys-tem of relations for i = 1 , . . . , k valid up to terms of order m − : p m ( t ) = p m ( T ) + γ m ( T ) µ ( { T } + Z Tt (cid:0) q m ( t ) dt − γ m ( s ) µ m ( ds ) (cid:1) ;( p m (0) , − p m ( T ) + γ m ( T ) µ ( { T } ) ∈ ∂λ m ℓ ( x m (0) , x m ( T ))+Φ ′∗ ( x m (0) , x m ( T )) (cid:0) ∂d ( · , S )(Φ( x m (0) , x m ( T ))) (cid:1) ; Z T (cid:0) H ( t, x m ( t ) , p m ( t ) , u i ( t )) − H ( t, x m ( t ) , p m ( t ) , u m ( t )) (cid:1) dt ≤ . (11)3. We can now easily finish the proof. By (H ) the functions k q m ( t ) k are boundedby the same summable function, hence uniformly integrable, by (H ) k γ m ( t ) k ≤ ρ g a.e., the sequences of p m (0) and w mT = − ( p m ( T ) + γ m ( T ) µ m ( { T } )) are bounded,hence the sequence of p m ( T ) is also bounded. We may assume that q m ( · ) weakconverge to some q ( · ), hence R Tt q m ( s ) ds uniformly converge to R Tt q ( s ) ds , and (as weobviously can assume that λ m converge to some λ ) the pairs ( p m (0) , w mT ) convergeto some ( w , w T ) ∈ λ∂ℓ ( x (0) , x ( T )) + ∂d ( · , S )( x (0) , x ( T )).The sequence of measures γ m ( t ) µ m ( dt ) is weak ∗ compact. We refer again toProposition 9.2.1 of [24] to conclude that the limiting measures have the form γ ( t ) µ ( dt ), where µ is a weak ∗ limit of µ m and γ ( t ) ∈ ∩ m clconv ( ∪ s>m ∂ C g ( t, · )( x s ( t ))almost everywhere. As a result, we deduce (as k p m ( · ) k are uniformly bounded)that p m ( · ) converge almost everywhere to p ( t ) = − w T + R Tt ( q ( s ) − γ ( s ) dµ ( s )) and p (0) = w , p ( T ) + γ ( T ) µ ( T ) = − w T . The transversality condition, the adjoint in-clusion and the integral inequality of Theorem 4.4 is now immediate from (11). Thenontriviality condition is obvious in the non-singular case as λ >
0. To prove thatthis condition holds also in the singular case, with λ = 0, we have to recall that x m ( · )are not feasible in ( OC k ) which means that y m = Φ( x m (0) , x m ( T )) S . Thereforethe norm of any element of ∂d ( · , S )( y m ) is 1. Taking into account that µ = 0 andΦ ′∗ ( x m (0) , x m ( T )) is one-to-one by (H ), we conclude that k ( p (0) , − p ( T )) k > ∩ m clconv ( ∪ s>m ∂ C g ( t, · )( x s ( t ))lies in ∂ >C g ( t, · )( x ( t )). Indeed, it is sufficient to note that (in the non-singular case) J m ( x m ( · )) >
0, and therefore max t g ( t, x m ( t )) >
0. (Otherwise x m ( · ) would beadmissible in ( OC ) and ℓ ( x m (0) , x m ( T )) < ℓ ( x (0) , x ( T )).) This completes theproof of the theorem. Below we use the following notation. Let f be a function or a mapping depending ontwo vector variables, e.g. f ( x, u ). Then f ′ stands for the full derivative of f , whilethe partial derivatives will be denoted f ′ x and f ′ u , the same with second derivatives. As in the first order theory we start here with introduction of a certain basic un-constrained model. This time this is the problem of minimizing J ( x, u, α , . . . , α k ) = g ( F ( x, u ) + k X i =1 α i F i ( x, u )) (12)subject to x ∈ X , u ∈ U, α i ≥
0. Here X is a Banach space, U is a closed subsetin a Banach space W and F i are mappings from X × W into a Banach space Y .As usual we fix some ( x, u ) ∈ X × U and assume that J has a local minimum on X × U × IR k + at ( x, u, , . . . , ) g is continuous and sublinear; the mappings F i , i = 0 , . . . , k , are continuousand continuously differentiable near ( x, u ) and twice differentiable at ( x, u ).Set y = F ( x, u ). Since the function ( x, α , . . . , α k ) → J ( x, u, α , . . . , α k ) attainsa local minimum on X × IR k + at ( x, , . . . , y ∗ ∈ ∂g ( y ) such that( y ∗ ◦ F ) ′ x ( x, u ) = 0 , ( y ∗ ◦ F i )( x, u ) ≥ y ∗ ∈ ∂g ( y ) satisfying (13).Set x = ( x, α , . . . , α k ), x = ( x, , . . . , F ( x , u ) stand for F ( x, u ) + P α i F i ( x, u ). We refer to x as feasible if α i ≥
0. Note first that for any ε > η > g ( F ( x , u ) + F ′ x ( x , u ) h ) ≥ g ( F ( x + h , u ) − ( ε/ k h k ≥ g ( y ) − ( ε/ k h k , (14)if k x − x k ≤ η , k h k ≤ η , k u − u k ≤ η . Here we have set h = ( h, β , . . . , β k ), k x k = k x k + P | α i | , k h k = k h k + P | β i | . The left inequality is valid for all such x , h ∈ X × IR k while the right one, of course, only if x , h ∈ X × IR k + .20ndeed, fix some x , u, h and choose a y ∗ ∈ ∂g (0) such that g ( F ( x + h , u )) = h y ∗ , F ( x + h , u ) i . Then g ( F ( x , u )+ F ′ x ( x , u ) h ) − g ( F ( x + h ) , u ) ≥ h y ∗ , F ( x , u )+ F ′ x ( x , u ) h ) − F ( x + h , u )) i . But k F ( x , u ) + F ′ x ( x , u ) h − F ( x + h , u )) k = r ( x, u, h ) k h k , where r ( x , u, h ) → x → x , u → u , h → k y ∗ k does not exceed the Lipschitz constant of g .It will be convenient to set for further discussions p ε ( x , u, h ) = g ( F ( x , u ) + F ′ x ( x , u ) h ) + ε k h k . By (14) p ε ( x , u, h ) ≥ g ( F ( x , u )) for feasible x , h satisfying k x − x k ≤ η , k h k ≤ η , k u − u k ≤ η . We claim that for every ε > η > δ > h ∈ X × IR k + p ε ( x , u, h ) = inf h ∈ X × IR k + , k h k≤ η p ε ( x , u, h ) ≥ g ( F ( x , u )) (15)whenever k x − x k ≤ δ , k u − u k ≤ δ . Indeed, taking a smaller η , if necessary, we canbe sure that g ( F ( x + h , u )) ≥ g ( F ( x , u )) if k x − x k < η , k u − u k < η , k h k < η . Thenby (14) p ε ( x , u,
0) = g ( y ) and p ε ( x , u, h ) ≥ g ( y ) + ( εη ) / k h k = η . It remainsto choose δ > | g ( F ( x , u ) + F ′ x ( x , u ) h ) − g ( y ) | < εη/ k x − x k < δ , k u − u k < δ and k h k ≤ η . As p ε ( x , u, · ) is a convex function, it follows that its lowerbound is realized in the η -ball around zero and (15) follows.Let Λ ε and Λ ε be the sets of y ∗ ∈ ∂ ε g ( y ) and y ∗ ∈ ∂g (0) respectively satisfying k ( y ∗ ◦ F ) ′ x ( x, u )) k ≤ ε, h y ∗ , F i ( x, u ) i ≥ − ε, i = 1 , . . . , k. (Here ∂ ε stands for the ε -subdifferential in the sense of convex analysis: y ∗ ∈ ∂ ε g ( y )if g ( z ) − g ( y ≥ h y ∗ , z − y i − ε for all z .) Our final claim is that we can choose δ > h ∈ X × IR k + p δ ( x , u, h ) = sup y ∗ ∈ Λ ε h y ∗ , F ( x , u )) . (16)Indeed,inf h p ε ( x , u, h ) = inf h ( sup y ∗ ∈ ∂g (0) h y ∗ , F ( x , u ) + F ′ x ( x , u ) h i + ε k h k )= sup y ∗ ∈ ∂g (0) ( h y ∗ , F ( x , u ) i + inf h ( h y ∗ , F ′ x ( x , u ) h i + ε k h k )))= sup y ∗ ∈ Λ ε ( h y ∗ , F ( x , u ) i + inf h ( h y ∗ , F ′ x ( x , u ) h i + ε k h k ))= sup y ∗ ∈ Λ ε h y ∗ , F ( x , u ) i = sup y ∗ ∈ Λ ε h y ∗ , F ( x , u ) i . The first equality follows from definitions. The second equality follows from thestandard minimax theorem (thanks to the fact that ∂g (0) is weak ∗ -compact). Tojustify the third equality we recall that h y ∗ , F ′ x ( x , u ) h i = h ( y ∗ ◦ F ) ′ x ( x, u ) , h i + h y ∗ , X β i F i ( x, u ) i
21o that the infimum over h = ( h, β , . . . , β k ) ∈ X × IR k + is −∞ if either k ( y ∗ ◦ F ) ′ x ( x, u ) k > ε or h y ∗ , F i ( x, u ) i < − ε for some i . Finally, if y ∗ ∈ ∂g (0), doesnot belong to ∂ ε g ( y ), then h y ∗ , y i ≤ g ( y ) − ε (recall that g is a sublinear function)and the last equality follows from obvious chain of inequalities below (where z ∗ ∈ Λand L is the Lipschitz constant of g ) h z ∗ , F ( x , u ) i ≥ h z ∗ , y i − L k F ( x , u ) − y k = g ( y ) − L k F ( x , u ) − y k ≥ h y ∗ , F ( x , u ) i + ε − L k F ( x , u ) − y k if δ is so small that ε − L k F ( x , u ) − y k > k x − x k < δ , k u − u k < δ .Combining (14)-(16), we conclude with Proposition 5.1.
Under the assumptions, for any ε > g ( F ( x, u )) = g ( F ( x , u )) ≤ sup y ∗ ∈ Λ ε h y ∗ , F ( x , u ) i = sup y ∗ ∈ Λ ε h y ∗ , F ( x, u ) + X α i F i ( x, u ) i for all ( x , u ) = ( x, u, α , . . . , α k ) ∈ X × U × IR k + close to ( x , u ) = ( x, u, , . . . , . We are ready now to state and proof the main result of this subsection. It willbe done under the following additional assumption:(H ) W is densely embedded into another Banach space V and F ′ u ( x, u ) extendsby continuity to the whole of V .Recall that by definition W is densely embedded into V if there is a one-to-onelinear mapping i : W → V such that i ( W ) is dense in V and k u k W ≥ k i ( u ) k V . Asusual we identify W and i ( W ).Finally, we need the concept of a critical set of J at ( x, u ) which is the collectionof all tuples ( h, u, β , . . . , β k ) with h ∈ X, u ∈ W, β i ≥ g (cid:0) F ( x, u ) + F ′ ( x, u )( h, u ) + k X i =1 β i F i ( x, u ) (cid:1) ≤ g ( F ( x, u )) . (17)We shall denote this set by Crit J Theorem 5.2.
Assume (H ), (H ). Then sup y ∗ ∈ Λ h y ∗ , F ′′ ( x, u )( h, u )( h, u ) + 2 F ′ u ( x, u ) v + k X i =1 β i ( F ′ ix ( x, u ) h + F ′ iu ( x, u ) u ) i ≥ whenever ( h, u, β , . . . , β k ) ∈ Crit J , v ∈ T ( U, u ; u ) (in V ) and the following propertyholds for v : for any ε > there are v m ∈ W converging to v in V and t m → such hat u + t m u + t m v m ∈ U and F ( x + th, u + t m u + t m v m ) = F ( x, u ) + tF ′ ( x, u )( h, u )+ t m F ′′ ( x, u )( h, u )( h, u ) + 2 F ′ u ( x, u ) v m ) + t m r m , (18) where k r m k V ≤ ε . Here of course ( h, u ) → F ′′ ( h, u )( h, u ) is the quadratic form associated with thesecond order derivative of F at ( x, u ). Remark 5.3. As g is a sublinear function, that is g ( λy ) = λg ( y ) and g ( y + z ) ≤ g ( y ) + g ( z ), the critical set contains the critical cone , obtained if we replace (17) by { ( h, u, β , . . . , β k ) : g (cid:0) F ′ ( x, u )( h, u ) + k X i =1 β i F i ( x, u ) (cid:1) ≤ } , lies in Crit J and coincides with it if F ( x, u ) = 0. Proof.
We may assume without loss of generality that k y ∗ k ≤ ∂g (0). By Proposition 5.1 and (18), for sufficiently large mg ( F ( x , u )) ≤ sup y ∗ ∈ Λ ε h y ∗ , F ( x + t m h , u + t m u + t m v m ) i = sup y ∗ ∈ Λ ε h y ∗ , F ( x , u ) + t m F ′ ( x , u )( h , u )+( t m / F ′′ ( x , u )( h , u )( h , u ) + 2 F ′ u ( x , u ) v m + 2 r m i≤ g ( F ( x , u ) + t m F ′ ( x , u )( h , u ))+( t m /
2) sup y ∗ ∈ Λ ε h y ∗ , F ′′ ( x , u )( h , u )( h , u ) + 2 F ′ u ( x , u ) v m + r m i . We note next that g ( F ( x , u ) + t m F ′ ( x , u )( h , u )) ≤ g ( F ( x , u )) since ( h , u ) =( h, u, β , . . . , β k ) ∈ Crit J . Thus0 ≤ sup y ∗ ∈ Λ ε h y ∗ , F ′′ ( x , u )( h , u )( h , u ) + 2 F ′ u ( x , u ) v m i + ε. By (H ) we can pass to limit as m → ∞ and write v instead of v m . The inequalityis valid for any ε >
0. To conclude the proof we note that Λ ε ′ ⊂ Λ ε if ε ′ < ε , thefunction under the sign of supremum is weak ∗ -continuous with respect to y ∗ andevery Λ ε is a weak ∗ -compact set (as a subset of ∂g (0)). Hence we can pass to thelimit as ε → ≤ sup y ∗ ∈ Λ h y ∗ , F ′′ ( x , u )( h , u )( h , u ) + 2 F ′ u ( x , u ) v i which is precisely what has been stated. 23 .2 Back to optimal control We shall consider the problem with more specialized end point constraints:(
OC2 ) minimize ℓ ( x (0) , x ( T )) , s . t . ˙ x = f ( t, x, u ) , u ∈ U ( t ); g i ( t, x ( t )) ≤ , i = 1 , . . . , s ; ℓ j ( x (0) , x ( T )) ≤ , j = 1 , . . . , l,ℓ j ( x (0) , x ( T )) = 0 , j = l + 1 , . . . , r. Here U ( t ) are closed subsets of an Euclidean space and ( x ( · ) , u ( · )) ∈ W , × L ∞ isa strong local minimum in the problem. For brevity we write f ( t ) for f ( t, x ( t ) , u ( t ))and, likewise, f ′ ( t ) and f ′′ ( t ) for derivatives of f in ( x, u ) at ( x ( t ) , u ( t )). We furtherassume that(H ) the functions ℓ j , j = 0 , , . . . , r are continuous and continuously differen-tiable near ( x (0) , x ( T )) and have second derivatives at ( x (0) , x ( T ));(H ) f ( t ), f ′ ( t ) and f ′′ ( t ) are summable on [0 , t ] and there is a δ > t the mapping ( x, u ) → f ( t, x, u ) is– continuous on B ( x ( t ) , δ ) × U ( t ) along with its derivative w.r.t. x . More-over for any K > {k f ( t, x ( t ) , u ) k : u ∈ U ( t ) , k u k ≤ K } andsup {k f ′ x ( t, x ( t ) , u ) k : u ∈ U ( t ) , k u k ≤ K } are summable;– continuously differentiable at points of B ( x ( t ) , δ ) × ( B ( u ( t ) , δ ) ∩ U ( t ));– twice differentiable at ( x ( t ) , u ( t )) uniformly in t in the sense that f ( t, x ( t ) + λh, u ( t ) + λu ) = f ( t ) + λf ′ ( t )( h, u ) + ( λ / f ′′ ( t )( h, u ) , ( h, u ) + r λ ( t, h, u ) , with R T k r λ ( t, h, u ) k dt = o ( λ ) k h kk u k .(H ) g i and their derivatives with respect to x are continuous; the second deriva-tives g ′′ ix ( t, x ( t )) exist and are continuous on [0 , T ]. Theorem 5.4.
Assume (H )-(H ). If ( x ( · ) , u ( · ) is a strong local minimum in theproblem, then there are numbers λ , . . . , λ r , a function of bounded variation p ( t ) on [0 , T ] and regular nonnegative measures µ i supported on the sets ∆ i = { t : g i ( t, x ( t )) = 0 } such that λ i ≥ ; λ i ℓ i ( x (0) , x ( T )) = 0 , i = 1 , . . . l , and the fol-lowing conditions are satisfied: X ≤ i ≤ l λ i + r X i = l +1 | λ i | + s X i =1 µ i ([0 , T ]) = 1 ; ( p (0) , − ( p ( T ) + s X i =1 g ′ ix ( T, x ( t ))) µ i ( { T } ) = r X i =0 λ i ℓ ′ i ( x (0) , x ( T )) ; p ( t ) = − (cid:16) p ( T ) + s X i =1 g ′ ix ( T, x ( t ))) µ i ( { T } (cid:17) + Z Tt H ′ x ( s, x ( s ) , p ( s ) , u ( s )) ds m X i =1 Z Tt g ′ ix ( s, x ( s )) µ i ( ds ) ; H ( t, x ( t ) , p ( t ) , u ( t )) = H ( t, x ( t ) , p ( t )) , a.e.. We shall denote by Λ the collection of all such ( λ , . . . , λ r , µ , . . . , µ s , p ( · )).The theorem is a consequence of Theorem 4.2. Indeed, set S = { ξ = ( ξ , . . . , ξ r ) ∈ IR r , ξ j ≤ , j = 1 , . . . , l, ξ j = 0 , j = l + 1 , . . . , r } Φ( x, x ′ ) = ( ℓ ( x, x ′ ) , . . . , ℓ r ( x, x ′ )),and take into account the following three elementary observations. The first is thateither the derivatives ℓ ′ i ( x (0) , x ( T )), i = l + 1 , . . . , r are linearly dependent or thereis a K > d (Φ( x (0) , x ( T )) , S ) ≤ K P ri = l +1 | ℓ i ( x (0) , x ( T )) | if x ( · ) is closeto x ( · ). The second observation is that by (H ) for g ( t, x ) = max i g i ( t, x ) we have ∂ >C g ( t, · )( x ) = conv { g ′ ix ( t, x ) : i ∈ I ( t, x ) } , where I ( t, x ) = { i : g i ( t, x ) = g ( t, x ) } .Finally, it is an easy matter to see that in case when all λ i = 0 and µ i = 0, the only p ( · ) that can satisfy the theorem is identical zero, so there is no need to include p ( · )in the nontriviality condition.We can now state the main result of this section. We shall assume in what followsthat ( OC2 ) is non-singular at ( x ( · ) , u ( · )), specifically that(H ) the equation − ˙ p = H ′ x ( t, x ( t ) , p, u ( t )) does not have a solution satisfying( p (0) , − p ( T )) = k X i = l +1 λ i ℓ ′ i ( x (0) , x ( T )) & H ( t, x ( t ) , p ( t ) , u ( t )) = H ( t, x ( t ) , p ( t )) (19)unless all λ i , i = l + 1 , . . . , r are zeros and therefore p ( t ) ≡ I = { } ∪ { i ∈ { , . . . , l } : ℓ i ( x (0) , x ( T )) = 0 } , and let ∆ i ( δ ) stand forthe δ -neighborhood of ∆ i . Given a bounded measurable selection w ( t ) of U ( t ), weshall consider the critical cone C ( w ( · )) associated with w ( · ) which is the collectionof triples ( h ( · ) , u ( · ) , β ) ∈ W , × L ∞ × IR + such that u ( · ) ∈ T ( U ( · ) , u ( · )) and ℓ ′ i ( x (0) , x ( T ))( h (0) , h ( T )) ≤ i ∈ I ; ℓ ′ i ( x (0) , x ( T ))( h (0) , h ( T )) = 0 , i = l + 1 , . . . , r ; g i ( t, x ( t )) + g ′ ix ( t, x ( t )) h ( t ) ≤ , ∀ t ;˙ h ( t ) = f ′ x ( t ) h ( t ) + f ′ u ( t ) u ( t )+ β ( f ( t, x ( t ) , w ( t )) − f ( t, x ( t ) , u ( t ))) . (20)for some δ > u ( · ) ∈ L ∞ and a measurable v ( · ) form a pair of secondorder feasible variations if u ( t ) ∈ T ( U ( t ) , u ( t )), v ( t ) ∈ T ( U ( t, u ( t ); u ( t ))) a.e. andthere are ¯ λ > ξ ( · ) such that for 0 < λ ≤ ¯ λ Z T k f ′ u ( t ) v ( t ) k dt < ∞ ; k f ′ u ( t ) k d ( u ( t ) + λu ( t ) , U ( t )) ≤ λ ξ ( t ) a.e.. heorem 5.5. Assume (H )-(H ). Then for any bounded measurable selection w ( · ) of U ( · ) , any ( h ( · ) , u ( · )) ∈ C ( w ( · )) and any measurable v ( · ) such that ( u ( · ) , v ( · )) form a pair of second order feasible variations we can find a collection of multipliersin Λ , say ( λ , . . . , λ r , µ , . . . , µ s , p ( · )) , such that r X i =1 λ i ℓ ′′ i ( x (0) , x ( T ))(( h (0) , h ( T )) , ( h (0) , h ( T )))+ s X i =1 Z T g ′′ ix ( t, x ( t ))( h ( t ) , h ( t )) µ i ( dt ) − Z T (cid:16) H ′′ ( x,u ) ( t, x ( t ) , p ( t ) , u ( t ))(( h ( t ) , u ( t ))( h ( t ) , u ( t )))+2 H ′ u ( t, x ( t ) , p ( t ) , u ( t )) v ( t )+ β (cid:0) ( H ′ x ( t, x ( t ) , p ( t ) , u ( t )) − H ′ x ( t, x ( t ) , p ( t ) , w ( t ))) h ( t )+ H ′ u ( t, x ( t ) , p ( t ) , u ( t )) u ( t ) (cid:1)(cid:17) dt ≥ . (21)
1. If β = 0, the inequality in (21) assumes the form r X i =1 λ i ℓ ′′ i ( x (0) , x ( T ))(( h (0) , h ( T )) , ( h (0) , h ( T )))+ s X i =1 Z T g ′′ ix ( t, x ( t ))( h ( t ) , h ( t )) µ i ( dt ) − Z T (cid:16) H ′′ ( x,u ) ( t, x ( t ) , p ( t ) , u ( t ))(( h ( t ) , u ( t ))( h ( t ) , u ( t )))+2 H ′ u ( t, x ( t ) , p ( t ) , u ( t )) v ( t ) ≥ x ( t ) , u ( t )). So it may be tempting to considerit a necessary condition for a weak minimum. But Λ is the set of Lagrange multi-pliers for which the maximum principle holds. So it would be interesting to find anexample of a weak minimum for which (22) does not hold. The first relation (21),on the other hand, deals with controls that can be arbitrarily far from u ( · ). This ina sense makes Theorem 5.5 a “real” second order necessary condition for a strongminimum. The simple example below demonstrates the phenomenon. Example 5.6.
Consider the problem:minimize x (1)s . t . ˙ x = u, ˙ x = x sin 2 πu ; u ∈ U ( t ) ≡ [0 , x (0) = x (0) = 0 . x ( t ) ≡ (0 , u ( t ) ≡ x (1) = 0and if u ( t ) ≤ / t , then both ˙ x ( t ) and ˙ x ( t ) are nonnegative for all t . Onthe other hand, it is clear that ( x ( · ) , u ( · )) is not a strong minimum. Indeed,take an ε > u ( t ) = 3 / t ≤ ε and u ( t ) = 0 for t > ε . Then 0 > x ( t ) > − ε for all t ∈ (0 , H ( t, x, p, u ) = p u + p x sin 2 πu , so that the adjoint system is˙ p = − p sin 2 πu, ˙ p = 0and the transversality conditions are p (1) = 0 , p (1) = −
1, so that the solution ofthe system corresponding to u ( · ) is p ( t ) ≡ , p ( t ) ≡ − H ( t, x ( t ) , p ( t ) , u ) ≡ H x ( t, x ( t ) , p ( t ) , u ) = − sin 2 πu , H x ( t, x, p, u ) ≡ H ′ u ( t, x ( t ) , p ( t ) , u ) ≡ H xx ( t, x, u ) ≡ H uu ( t, x ( t ) , p ( t ) , u ) ≡ H x u ( t, x ( t ) , p ( t ) , u ) = − π cos 2 πu .Verification of (22) is now equally simple. The set Λ of multipliers consists ofa single element ( λ = 1 and p ( t ) ≡ (0 , − T ( U ( t ) , u ( t )) ≡ IR + , so the elementsof the critical cone, if β = 0, are defined by the system ˙ h = u, ˙ h = 0, h (0) =0 , h (0) = 0, u ≥
0. Hence h ( t ) ≥ h ( t ) ≡ u ( t ) ≥ H ′′ ( x,u ) ( t, x ( t ) , p ( t ) , u ( t ))( h ( t ) , u ( t ))( h ( t ) , u ( t )) = 2 πp ( t ) cos 2 πu ( t ) = − πh ( t ) u ( t )and (22) reduces to 2 π R h ( t ) u ( t ) dt ≥ w ( · ) are definedby the system ˙ h = u + βw ( t ), ˙ h = 0, h (0) = 0 , h (0) = 0, u ≥
0. So if β > u ( t ) ≡ w ( t ) = 3 / t ∈ [0 , ε ] and w ( t ) = 0 for t > ε , we findthat ε > h ( t ) > t > β Z T H x ( t, x ( t ) , p ( t ) , w ( t )) h ( t ) dt = β Z ε h ( t ) sin(3 / πdt < . It seems that (21) is a new type of a second order condition that has not appearedin the literature so far. Condition (22), on the other hand, extends a recent result ofFrankowska and Osmolovskii [8] proved for autonomous problems without equalityend point constraints.We furher note that, although in [8] no analogue of (H ) is explicitly stated, thecondition is automatically satisfied for the problem considered there because of theabsence of end point equality constraints. (We also mention, to avoid confusion withsigns, that the p ( · ) in [8] is the same as − p ( · ) here, so the first order condition thereis the “minimum principle”and plus stands before the last integral in (22).)27. Another necessary condition for a strong minimum was proved by Pales andZeidan in [20]. In this result u ( · ) is compared with controls that may substantiallydiffer from it on sets of positive measure. But these controls have very specificstructure: all of them have the form u ( t + θ ( t )), where θ ( t ) is uniformly small.Moreover, the proof of the result essentially relies on a differentiability assumptionon f with respect to t , as well as on the assumption that the control set U ( t ) doesnot depend on t . Under these assumptions the authors, using a modification of themethod of Dubovitzkii and Milyutin, pass to another problem in which t appears asa state variable. The first and second order necessary optimality conditions for thestrong minimum in the original problem are then obtained as the first and secondorder conditions necessary for a weak minimum in the new problem.However, as long as the first order condition is in our disposal, a simpler con-struction (also involving a change of the time variable) can be used to get the secondorder condition of [20]. In the context of ( OC2 ) , with U not depending of t , it isenough to consider the problemminimize ℓ ( y (0) , y (1)) , s . t . dydτ = vf ( t, y, u ( T τ )); dtdτ = v, v ≥ g i ( t, y ) ≤ , i = 1 , . . . , s ; ℓ j ( y (0) , y (1)) ≤ , j = 1 , . . . , l,ℓ j ( y (0) , y (1)) = 0 , j = l + 1 , . . . , r. It is obvious that ¯ t ( τ ) = T τ, y ( τ ) = x ( T τ ) , v ( τ ) ≡ T is a local minimum in theproblem. Applying (21) to this problem we get a second order condition very closeto that of [20] with a slightly different critical cone. We leave the details to thereader. (Just note that, although the full maximum principle for ( OC2 ) cannot beobtained from the first order condition for the last problem, the adjoint equationand the transversality conditions for (
OC2 ) are easily recovered.) It is also an easymatter to see that the second order condition of [20] is satisfied in Example 5.6.3. There is a group of closely connected second order conditions for a strongminimum for the case when the optimal control u ( · ) is piecewise continuous, inparticular for bang-bang controls (see e.g. monographs of Milyutin–Osmolovskii [17]and Osmolovskii-Maurer [19]). Our theorem that does not take specific structure ofthe optimal control into account and works for arbitrary measurable controls clearlydoes not cover these results. However, it is not a difficult matter to see that theunconstrained reduction techniques developed in the proof in the next section canbe applied to optimal controls having special structure and in particular to analyzeconditions coming from variations of points of discontinuity of the optimal controlwhen the latter is piecewise continuous. In [20] a problem with variable time interval is considered plus the inequality constraints havemore general structure than here. .4 Unconstrained reduction It is an easy matter to see that (H )-(H ) together with (H ) imply the hypotheses(H )-(H ) and we can use the reduction theorems for our problem. This time weshall use Theorem 3.1 (rather than Theorem 3.2) applied to problem ( OC2 ). Takea small δ > U ( t ) = B ( u ( t ) , δ ) ∩ U ( t ). Let as before u ( · ) , . . . , u k ( · ) be a finite collection of of elements of U (defined as in Section 3) and U k ( t ) = U ( t ) ∪ { u ( t ) , u ( t ) , . . . , u k ( t ) } . Then ( x ( · ) , u ( · )) is a strong local minimumin the problem ( OC2 k ) obtained from ( OC2 ) if we replace U ( t ) by U k ( t ). Thanksto (H ) we only need to consider the non-singular case. By Theorem 3.1 there is a λ > x ( · ) , u ( · ) , , . . . ,
0) is a local minimum of the functional J k ( x ( · ) , u ( · ) , α , . . . , α k ) = λψ ( x ( · )) + r X i = l | ℓ i ( x (0) , x ( T )) | + Z T k ˙ x ( t ) − f ( t, x ( t ) , u ( t )) − k X i =1 α i ( f ( t, x ( t ) , u i ( t )) − f ( t, x ( t ) , u ( t ))) k dt on Z k (defined in Section 3) in the topology of W , × L ∞ × IR k . It is easy to seethat J k is a particular case of the basic model (12).Indeed, consider the following function on IR r +1 × ( C [0 , T ]) s × L : g (( ξ , . . . , ξ r ) , ( y ( · ) , . . . , y s ( · )) , z ( · ))= λ max { max ≤ i ≤ l ξ i , max ≤ i ≤ s max ≤ t ≤ T y i ( t ) } + r X i = l +1 | ξ i | + Z T k w ( t ) k dt. Let further U be the collection of all bounded measurable selections u ( · ) of U ( · ).Note that by (H ) any such u ( · ) belongs to U . Take u ( · ) , . . . , u k ( · ) ∈ U , and letthe mappings F i : W , × L → IR r +1 × ( C [0 , T ]) s × L , i = 0 , . . . , k be defined asfollows: F ( x ( · ) , u ( · )) = (( ξ , . . . , ξ r ) , ( y ( · ) , . . . , y s ( · )) , w ( · )); F i ( x ( · ) , u ( · )) = ((0 , . . . , , (0 , . . . , , w i ( · )) , i = 1 , . . . , k, where ξ j = ℓ i ( x (0) , x ( T )); y i ( t ) = g i ( t, x ( t )); w ( t ) = ˙ x ( t ) − f ( t, x ( t ) , u ( t )); w i ( t ) = − ( f ( t, x ( t ) u i ( t )) − f ( t, x ( t ) , u ( t ))) , i = 1 , . . . , k. Then J k is precisely g ( F ( x ( · ) , u ( · ))+ P α i F i ( x ( · ) , u ( · ))) and all assumptions of (H )are obviously satisfied.A first order necessary condition for ( x ( · ) , u ( · ) , , . . . ,
0) to be a local minimumof J k can of course be obtained from Proposition 4.1. However we prefer to givean independent (and fairly simple) proof based on (13) with the aim to emphasizeconnection with the subsequent proof of the second order condition.29n terms of (12), y = ( ¯ ξ , . . . , ¯ ξ r , y ( · ) , . . . , y s ( · ) , ξ i = ℓ i ( x (0) , x ( T )) and y i ( t ) = g i ( t, x ( t )), and any y ∗ ∈ ∂g ( y ) is represented by ( λ , . . . , λ r , µ , . . . , µ s , p ( · )),where λ i ≥ λ i ℓ i ( x (0) , x ( T )) = 0 for i = 0 , . . . , ℓ , | λ i | ≤ i = l + 1 , . . . , r , µ i are nonnegative measures supported on ∆ i = { t : g i ( t, x ( t )) = 0 } , λ + . . . + λ l + µ ([0 , T ]) + . . . + µ s ([0 , T ]) = λ and k p ( t ) k ≤ k ofelements of ∂g ( y ) satisfying (13), that is such that r X i =0 λ i ℓ ′ i ( x (0) , x ( T ))( h (0) , h ( T )) + s X i =1 Z T g ′ ix ( t, x ( t )) h ( t ) µ i ( dt )+ Z T h p ( t ) , ˙ h ( t ) − f ′ x ( t, x ( t ) , u ( t ) h ( t ) i dt = 0 (23)for all h ( · ) ∈ W , and Z T h p ( t ) , f ( t, x ( t ) , u ( t )) − f ( t, x ( t ) , u i ( t )) i dt ≥ , i = 1 , . . . , k (24)is nonempty. Setting h ( t ) = h (0) + R t ˙ h ( s ) ds and changing the order of integrationin the second term and the second part of the third terms of (23), and applyingafterwards (23) with h ( · ) equal zero at the ends of the interval, we find that for such h ( · ) Z T h p ( t ) + Z Tt ( s X i =1 g ′ ix ( s, x ( s )) dµ i ( s ) − H ′ x ( s, x ( s ) , p ( s ) , u ( s )) ds ) , ˙ h ( t ) i dt = 0 . It follows that p ( t ) + Z Tt ( s X i =1 g ′ ix ( s, x ( s )) dµ i ( s ) − H ′ x ( s, x ( s ) , p ( s ) , u ( s )) ds ) = const a . e ., (25)the constant obviously equal to lim t → T p ( t ) + P si =1 g ′ ix ( T, x ( T )) µ i ( { T } ). In turn,this implies that p ( · ), having been corrected on a set of measure zero, if necessary,becomes a function of bounded variation. Returning back to the original form of(23) and integrating the first term in the last integral by parts, we conclude, setting p ( T ) = lim t → T p ( t ), that for any h ( · ) ∈ W , r X i =0 λ i h ℓ ′ i ( x (0) , x ( T ) , ( h (0) , h ( T )) i + s X i =1 h g ′ ix ( T, x ( T )) µ i ( { T } ) , h ( T ) i + h p ( T ) , h ( T ) i − h p (0) , h (0) i = 0 . p (0) , − ( p ( T ) + s X i =1 g ′ ix ( T, x ( T )) µ i ( { T } )) = r X i =0 λ i ℓ ′ i ( x (0) , x ( T )) (26)We can now summarize. Proposition 5.7.
We assume (H )-(H ). If ( x ( · ) , u ( · ) , , . . . , is a local minimumof J k , then the set Λ k of tuples ( λ, . . . , λ r , µ , . . . , µ s , p ( · )) , where λ i are numbers, µ i are nonnegative measures supported on ∆ i and p ( · ) are functions of boundedvariations, satisfying λ i ≥ , j = 0 , . . . , l, λ + · · · + λ l + | λ l +1 | + · · · + | λ r | + µ ([0 , T ]) + · · · + µ s ([0 , T ]) > along with (24)-(26) is nonempty. The structure of the proof does not much differ from the structure of the proof of themaximum principle in Section 4. It is actually simpler as we do not need to considerthe singular case. We first get a second order condition for J k using Theorem5.2, then reformulate this condition for the problem ( OC2 k ) and eventually for theoriginal problem ( OC2 ).So given a finite collection { u ( · ) , . . . , u k ( · ) } of bounded measurable selections of U ( · ), we set U k ( t ) = U ( t ) ∪ { u ( t ) , u ( t ) , . . . , u k ( t ) } and consider the correspondingfunctional J k .To be able to apply Theorem 5.2, we need to verify that J k satisfies all conditionsof the theorem under a suitable choice of the Banach spaces X, W, V and Y . Set X = W , , W = L ∞ (so that U ⊂ W ), and let Y = IR r +1 × ( C [0 , T ]) s × W , .It is an easy matter to see that the mappings F i : X × W → Y defined in theprevious section are continuously differentiable near ( x, u ) and twice differentiableat the point. This means that (H ) holds for our problem. Let further V be thespace of measurable v ( · ) with the norm k v ( · ) k V = Z T k f ′ u ( t, x ( t ) , u ( t )) v ( t ) k dt. Verification of (H ) now does not present any difficulty.Thus, all we need is to verify that for a pair ( u ( · ) , v ( · )) of second order feasiblevariations and for any ε > v m ( · ) ∈ L ∞ converging to v ( · ) in V and of positive λ m → t m replaced by λ m ).So let an ε > u ( · ) , v ( · )) of second order feasible variations be given.Take 0 < λ m ≤ ε/m , and for any m = 1 , , . . . let∆ m = { t : k v ( t ) k ≤ m, d ( u ( t ) + λ m u ( t ) + λ m v ( t ) , U ( t )) ≤ λ m ε } . m goes to T as m → ∞ .Now for t ∈ ∆ m we choose a measurable v m ( t ) such that k v ( t ) − v m ( t ) k ≤ ε and u ( t ) + λ m u ( t ) + λ m v m ( t ) ∈ U ( t ). For t ∆ m we define v m ( t ) also satisfying the lastinclusion and such that λ m k v m ( t ) k = d ( u ( t ) + λ m u ( t ) , U ( t )). Then λ m k v m ( t ) k ≤k u ( t ) k almost everywhere as u ( t ) ∈ U ( t ). Since u ( · ) is bounded measurable, itfollows that every v m ( · ) is also bounded measurable, hence belonging to L ∞ = W .Let ∆ cm stand for [0 , T ] \ ∆ m . Then by the the second order feasibility condition: Z ∆ cm k f ′ u ( t ) v m ( t ) k dt ≤ λ − m Z ∆ cm k f ′ u ( t ) k d ( u ( t ) + λ m u ( t ) , U ( t )) dt ≤ Z ∆ cm ξ ( t ) dt → m → ∞ and therefore v m ( · ) → v ( · ) in V .Finally, by uniform boundedness of u ( · ) + λ m v m ( · ) f ( t, x ( t ) + λ m h ( t ) , u ( t ) + λ m ( t ) u ( t ) + λ m v m ( t ))= f ( t ) + λ m ( f ′ x ( t ) h ( t ) + f ′ u ( t )( u ( t ) + λ m v m ( t )))+( λ m / f ′′ ( t )( h ( t ) , u ( t ) + λ m v m ( t ))( h ( t ) , u ( t ) + λ m v m ( t )) + q m ( t )= f ( t ) + λ m ( f ′ x ( t ) h ( t ) + f ′ u ( t ) u ( t ))+( λ m / f ′′ ( t )( h ( t ) , u ( t ))( h ( t ) , u ( t )) + 2 λ m f ′ u ( t ) v m ( t )) + r m ( t ) . This is exactly what we need. Indeed, R T k q m ( t ) k dt = o ( λ ) as, follows from theuniform twice differentiability assumption of (H ). On the other hand (cid:12)(cid:12) f ′′ ( t )( h ( t ) , u ( t ) + λ m v m ( t ))( h ( t ) , u ( t ) + λ m v m ( t )) − f ′′ ( t )( h ( t ) , u ( t ))( h ( t ) , u ( t )) (cid:12)(cid:12)(cid:12) ≤ k f ′′ ( t ) k ( k h ( t ) + 2 k u ( t ) k + λ m k v m ( t ) k ) k λ m v m ( t ) k , so that taking a K ≥ k f ′′ ( t ) k ( k h ( t )+3 k u ( t ) k + ε k ) a.e., we see that k r m ( t ) − q m ( t ) k ≤ Kε for t ∈ ∆ m and k r m ( t ) − q m ( t ) k ≤ K if t ∆ m . Therefore R T k r m ( t ) k dt ≤ ( K + 1) T ε for sufficiently large m . Thus J k does satisfy all conditions of Theorem5.2 and we can apply the theorem.Note further that a tuple ( h ( · ) , u ( · ) , β , . . . , β k ) with β i ≥ J k , if(20) holds with the last relation replaced by˙ h ( t ) = f ′ x ( t ) h ( t ) + f ′ u ( t ) u ( t )+ k X i =1 β ( f ( t, x ( t ) , u i ( t )) − f ( t, x ( t ) , u ( t ))) . (cf. Remark 5.3).Let further a bounded measurable selection w ( · ) of U ( · ) be given. Set u ( · ) = w ( · ), and let { u ( · ) , . . . , u k ( · ) } be a finite collection of bounded measurable selections32f U ( · ). Set as before U k ( t ) = U ( t ) ∪ { u ( t ) , . . . , u k ( t ) } and consider the correspond-ing functional J k . By Theorem 5.2 for any ( h ( · ) , u ( · ) , β , . . . , β k ) with β = β ≥ β i = 0 , i = 2 , . . . , k such that (20) holds, that is such that ( h ( · ) , u ( · ) , β ) ∈ C ( w ( · )), we can find a tuple of multiplies ( λ , . . . , λ r , µ , . . . , µ s , p ( · )) ∈ Λ k such that(21) is satisfied. But we saw in the proof of the maximum principle that k lambda k are compact sets, Λ k ′ ⊂ Λ k if Λ k ′ is defined by a bigger set of selections of U ( · )and the intersection Λ of all Λ k is nonempty. So we can be sure that there is a( λ , . . . , p ( · )) ∈ Λ such that (21) holds.
References [1] A.V. Arutyunov and R.B. Vinter, A simple finite approximations proof ofthe Pontryagin maximum principle, under reduced differentiability Hypothe-ses,
Set-Valued Analysis , (2004), 5-24.[2] J.P. Aubin and I. Ekeland, Applied Functional Analysis , J. Wiley, 1984[3] E.P. Avakov and G.G. Magaril-Il’yaev, Controllability and necessary optimalityconditions of second order in optimal control (in Russian)[4] C. Castaing and M. Valadier,
Convex Analysis and Measurable Multifunctions ,Lecture Notes Math. , Springer, 1977.[5] F.H. Clarke, The maximum principle under minimal hypotheses,
SIAM J. Con-tol Optimization (1976), 1078-1091.[6] F.H. Clarke, Optimization and Nonsmooth Analysis , Wiley-Interscience, 1983.[7] A.Ya. Dubovitzkii and A.A. Milyutin, Zadachi na ekstremum pri nalichiiogranichenij, Zh. Vychisl. Matematiki i Mat. Fiziki, (1965), 395-453 (inRussian; English translation: Problems for extremum under constraints, USSRComput. Math. Math. Physics, (1965)).[8] H. Frankowska and N.P. Osmolovskii, Strong local minimizers in optimal con-trol. Problems with state constraints: second order necessary conditions, SIAMJ. Control Optim. , (2018), 2353-2376.[9] R.V. Gamkrelidze, On some extremal problems in the theory of differentialequations with applications to the theory of optimal control, SIAM J. Control (1965), 106-128.[10] A.D. Ioffe, Necessary conditions in nonsmooth optimization, Math. OprerationResearch , (1984), 159-189. 3311] A.D. Ioffe, Euler-Lagrange and Hamiltonian formalisms in dynamic optimiza-tion, Trans. Amer. Math. Soc. , (1997), 2871-2900.[12] A.D. Ioffe, Variational Analysis of Regular Mappings , Springer 2017.[13] A.D. Ioffe, On generalized Bolza problem and its application to dynamic opti-mization,
J. Optim. Theory Appl. , (2019), 285-309.[14] A.D. Ioffe and R.T. Rockafellar, The Euler and Weierstrass conditions for nons-mooth variational problems, Calculus of variations and PDEs , (1996), 59-87.[15] A.D. Ioffe and V.M. Tihomirov, Theory of Extremal Problems , Nauka, Moscow1974 (in Russian); English translation: North Holland 1979.[16] P.D. Loewen,
Optimal control via Nonsmooth Analysis , CRM Proceedengs &Lecture Notes , AMS 1993.[17] A.A. Milyutin and N.P. Osmolovskii, Calculus of Variations and Optimal Con-trol , AMS 1998.[18] B.S. Mordukhovich,
Variational Analysis and Generalized Differentiation , vol2, Springer, 2006.[19] N.P. Osmolovskii and H. Maurer,
Applications of Regular and Bang-Bang Con-trols , SIAM 2012.[20] Z. Pales and V. Zeidan, First and second order optimality conditions in optimalcontrol with pure state constraints,
Nonlinear Anal. TMA , (2007), 2506-2526.[21] J.P. Penot, Calculus Without Derivatives , Graduate Texts in Mathematics, v. , Springer, 2012.[22] L.S. Pontryagin, V.G. Boltyanskii, R.V. Gamkrelidze and E.F. Mishchenko,
The Mathemetical Theory of Optimal Processes , Fizmatgiz 1961 (in Russian),English translation: Pergamon Press, 1964.[23] R.T. Rockafellar and R.J.B. Wets,
Variational Analysis , Springer 1998.[24] R.B. Vinter,